Literature DB >> 32053594

Coding with transient trajectories in recurrent neural networks.

Abstract

Following a stimulus, the neural response typically strongly varies in time and across neurons before settling to a steady-state. While classical population coding theory disregards the temporal dimension, recent works have argued that trajectories of transient activity can be particularly informative about stimulus identity and may form the basis of computations through dynamics. Yet the dynamical mechanisms needed to generate a population code based on transient trajectories have not been fully elucidated. Here we examine transient coding in a broad class of high-dimensional linear networks of recurrently connected units. We start by reviewing a well-known result that leads to a distinction between two classes of networks: networks in which all inputs lead to weak, decaying transients, and networks in which specific inputs elicit amplified transient responses and are mapped onto output states during the dynamics. Theses two classes are simply distinguished based on the spectrum of the symmetric part of the connectivity matrix. For the second class of networks, which is a sub-class of non-normal networks, we provide a procedure to identify transiently amplified inputs and the corresponding readouts. We first apply these results to standard randomly-connected and two-population networks. We then build minimal, low-rank networks that robustly implement trajectories mapping a specific input onto a specific orthogonal output state. Finally, we demonstrate that the capacity of the obtained networks increases proportionally with their size.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2020 PMID： 32053594 PMCID： PMC7043794 DOI： 10.1371/journal.pcbi.1007655

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

Introduction

The brain represents sensory stimuli in terms of the collective activity of thousands of neurons. Classical population coding theory describes the relation between stimuli and neural firing in terms of tuning curves, which assign a single number to each neuron in response to a stimulus [1-3]. The activity of a neuron following a stimulus presentation typically strongly varies in time and explores a range of values, but classical population coding typically leaves out such dynamics by considering either time-averaged or steady-state firing. In contrast to this static picture, a number of recent works have argued that the temporal dynamics of population activity may play a key role in neural coding and computations [4-14]. As the temporal response to a stimulus is different for each neuron, an influential approach has been to represent population dynamics in terms of temporal trajectories in the neural state space, where each axis corresponds to the activity of one neuron [15-18]. Coding in this high-dimensional space is typically examined by combining linear decoding and dimensionality-reduction techniques [19-21], and the underlying network is often conceptualised in terms of a dynamical system [18, 22–30]. Such approaches have revealed that the discrimination between stimuli based on neural activity can be higher during the transient phases than at steady state [16], arguing for a coding scheme in terms of neural trajectories. A full theory of coding with transient trajectories is however currently lacking. To produce useful transient coding, the trajectories of neural activity need to satisfy at least three requirements [4]. They need to be (i) stimulus-specific, (ii) robust to noise and (iii) non-monotonic, in the sense that the responses to different stimuli differ more during the transient dynamics than at steady-state. This third condition is crucial as otherwise coding with transients can be reduced to classical, steady-state population coding. Recent works have shown that recurrent networks with so-called non-normal connectivity can lead to amplified transients [28, 31–35], but general sufficient conditions for such amplification were not given. We start by reviewing a well-known result linking the norm of the transient activity to the spectrum of the symmetric part of the connectivity matrix. This results leads to a simple distinction between two classes of networks: networks in which all inputs lead to weak, decaying transients, and networks in which specific inputs elicit transiently amplified responses. We then characterize inputs that lead to non-monotonic trajectories, and show that they induce transient dynamics that map inputs onto orthogonal output directions. We first apply these analyses to standard two-population and randomly-connected networks. We then specifically exploit these results to build low-rank connectivity matrices that implement specific trajectories to transiently encode specified stimuli, and examine the noise-robustness and capacity of this setup.

Results

We study linear networks of N randomly and recurrently coupled rate units with dynamics given by: Such networks can be interpreted as describing the linearized dynamics of a system around an equilibrium state. In this picture, the quantity r represents the deviation of the activity of the unit i from its equilibrium value. For simplicity, in the following we refer to the quantity r as the firing rate of unit i. Here J denotes the effective strength of the connection from neuron j to neuron i. Unless otherwise specified, we consider an arbitrary connectivity matrix J. Along with the recurrent input, each unit i receives an external drive I(t)r0, in which the temporal component I(t) is equal for all neurons, and the vector r0 (normalized to unity) represents the relative amount of input to each neuron.

Monotonic vs. amplified transient trajectories

We focus on the transient autonomous dynamics in the network following a brief input in time (I(t) = δ(t)) along the external input direction r0, which is equivalent to setting the initial condition to r0. The temporal activity of the network in response to this input can be represented as a trajectory r(t) in the high-dimensional space in which the i-th component is the firing rate of neuron i at time t. We assume the network is stable, so that the trajectory asymptotically decays to the equilibrium state that corresponds to r = 0. At intermediate times, depending on the connectivity matrix J and on the initial condition r0, the trajectory can however exhibit two qualitatively different types of behavior: it can either monotonically decay towards the asymptotic state or transiently move away from it (Fig 1A and 1B). We call these two types of trajectories respectively monotonic and amplified.

Fig 1

Monotonically decaying vs. amplified transient dynamics.

Monotonically decaying vs. amplified transient dynamics.

Dynamics of a linear recurrent network in response to a short external perturbation along a given input direction r0. The left and right examples correspond to two different connectivity matrices, where the connection strengths are independently drawn from a Gaussian distribution with zero mean and variance equal to g2/N (left: g = 0.5; right: g = 0.9). A. Firing rate dynamics of 10 individual units. B. Projections of the population activity onto the first two principal components of the dynamics. Yellow and red color correspond respectively to g = 0.5 and g = 0.9. C. Temporal dynamics of the activity norm ∥r(t)∥. Left: in the case of weakly non-normal connectivity the activity norm displays monotonic decaying behaviour for any external input perturbation. Right: for strongly non-normal connectivity, specific stimuli generate a transient increase of the activity norm. N = 200 in simulations. The two types of transient trajectories can be distinguished by looking at the Euclidean distance between the activity at time point t and the asymptotic equilibrium state, given by the activity norm . Focusing on the norm allows us to deal with a single scalar quantity instead of N firing rates. Monotonic and amplified transient trajectories respectively correspond to monotonically decaying and transiently increasing ∥r(t)∥ (Fig 1C). Note that a transiently increasing ∥r(t)∥ necessarily implies that the firing rate of at least one neuron shows a transient increase in its absolute value before decaying to zero. One approach to understanding how the connectivity matrix J determines the transient trajectory is to project the dynamics on the basis formed by the right-eigenvectors {v} of J [36]. The component along the k—th eigenmode decays exponentially and the activity norm can be expressed as: If all the eigenvectors v are mutually orthogonal, then the squared activity norm is a sum of squares of decaying exponentials, and therefore a monotonically decaying function. Connectivity matrices J with all orthogonal eigenvectors are called normal matrices, and they thus generate only monotonic transients. In particular, any symmetric matrix is normal. On the other hand, connectivity matrices for which some eigenvectors are not mutually orthogonal are called non-normal [37]. For such matrices, the second term under the square root in Eq (2) can have positive or negative sign, so that the norm cannot in general be written as the sum of decaying exponentials. It is well known that non-normal matrices can lead to non-monotonic transient trajectories [28, 31–35, 38]. Nonetheless, a non-normal connectivity matrix J is just a necessary, but not a sufficient condition for the existence of transiently amplified trajectories. As will be illustrated below, having non-orthogonal eigenvectors does not guarantee the existence of transiently amplified inputs. This raises the question of identifying the sufficient conditions on the connectivity matrix J and input r0 for the transient trajectory to be amplified. In the following, we point out a simple criterion on the connectivity matrix J for the existence of amplified trajectories, and show that it is possible to identify the input subspace giving rise to amplified trajectories and estimate its dimensionality.

Two classes of non-normal connectivity

To distinguish between monotonic and amplified trajectories, we focus on the rate of change d∥r(t)∥/dt of the activity norm. For a monotonic trajectory, this rate of change is negative at all times, while for amplified trajectories it transiently takes positive values before becoming negative as the activity decays to the equilibrium value. Using this criterion, we can determine the conditions under which a network generates an amplified trajectory for at at least one input r0. Indeed, the rate of change of the activity norm satisfies (see [38, 39]) Here the matrix J denotes the symmetric part of the connectivity matrix J. The right hand side of Eq (3) is a Rayleigh quotient [40]. It reaches its maximum value when r(t) is aligned with the eigenvector of J associated with its largest eigenvalue, λmax(J), and the corresponding maximal rate of change of the activity norm is therefore λmax(J) − 1. Eq (3) directly implies that a necessary and sufficient condition for the existence of transiently amplified trajectories is that the largest eigenvalue of the symmetric part J be larger than unity, λmax(J) > 1 [38]. If that is the case, choosing the initial condition along the eigenvector associated with λmax(J) leads to a positive rate of change of the activity norm at time t = 0, and therefore generates a transient increase of the norm corresponding to an amplified trajectory, which shows the sufficiency of the criterion. Conversely, if a given input produces an amplified trajectory, at least one eigenvalue of J is necessarily larger than one. If that were not the case, the right hand side of the equation for the norm would take negative values for all vectors r(t), implying a monotonic decay of the norm. This demonstrates the necessity of the criterion. The criterion based on the symmetric part of the connectivity matrix allows us to distinguish two classes of connectivity matrices: if λmax(J) < 1 all external inputs r0 lead to monotonically decaying trajectories (non-amplifying connectivity); if λmax(J) < 1 specific input directions lead to a non-monotonic amplified activity norm (amplifying connectivity). The key point here is that for a non-normal connectivity matrix J, the symmetric part J is in general different from J. The condition for the stability of the system () and the condition for transient amplification (λmax(J) > 1) are therefore not mutually exclusive. This is instead the case for normal networks, which include symmetric, anti-symmetric, orthogonal connectivity matrices and trivial one-dimensional dynamics. The simplest illustration of this result is a two-population network. In that case the relationship between the eigenvalues of J and J is straightforward. The eigenvalues of J and J are given by where Tr(J) and Det(J) are the trace and determinant of the full connectivity matrix J, and 2Δ is the difference between the off-diagonal elements of J. Assuming for simplicity that the eigenvalues of J are real, Eq (4) show that the maximal eigenvalue of J is in general larger than the maximal eigenvalue of J, and the difference between the two is controlled by the parameter Δ which quantifies how non-symmetric the matrix J is. If Δ is large enough, J will have an unstable eigenvalue, even if both eigenvalues of J are stable (Fig 2A). The value of Δ therefore allows to distinguish between non-amplifying and amplifying connectivity. Furthermore, for amplifying connectivity, the parameter Δ directly controls the amount of amplification in the network (Fig 2B), defined as the maximum value of the norm ∥r(t)∥ over time and initial conditions r0 (see Methods). A specific example is a network consisting of an excitatory and an inhibitory population [32]. In that case our criterion states that the excitatory feedback needs to be (approximately) larger than unity in order to achieve transient amplification, when k is larger than but close to one (Fig 2C and S6 Text).

Fig 2

Dynamical regimes for a network of two interacting populations.

Dynamical regimes for a network of two interacting populations.

A. Relation between the eigenvalues of the connectivity matrix J (blue dots) and the eigenvalues of its symmetric part, J (red dots). Both pairs of eigenvalues are symmetrically centered around Tr(J)/2, but the eigenvalues of J lie further apart (Eq 4), and the maximal eigenvalue of J can cross unity if the difference 2Δ between the off-diagonal elements of the connectivity matrix is sufficiently large (bottom panel). B. Value of the maximum amplification of the system (quantified by the maximal singular value σ1(P) of the propagator, see Methods) as a function of the non-normal parameter Δ. Here we fix the two eigenvalues of J, the largest of which effectively determines the largest timescale of the dynamics, and vary Δ. Colored traces correspond to different values of the largest timescale of the system . For small values of Δ the maximum amplification is equal to one, and it increases approximately linearly when Δ is larger than the critical value. Each colored trace corresponds to a different choice of Tr(J) and Det(J). From top to bottom traces: Tr(J) = 0, −0.5, −2, −4 and Det(J) = Tr2(J)/4 (for convenience), corresponding respectively to τ = 1, 0.8, 0.5, 0.33. The trace for τ = 1 corresponds to . C. Dynamical regimes for an excitatory-inhibitory two population model, as in [32]. Here w represents the weights of the excitatory connections (J = J = w) and −kw the weights of the inhibitory ones (J = J = −kw). The inhibition-dominated regime corresponds to k > 1. The color code corresponds to the maximum amplification, as quantified by the maximal singular value σ1(P). The grey trace corresponds to the boundary between the monotonic and the amplified parameter regions. The red trace represents the stability boundary, with the unstable region hatched in red. In order to achieve transient amplification the excitatory weight w has to be approximately larger than unity, when k is larger than but close to one. Note that amplification can be obtained also for 0 < k < 1, in a parameter region limited by the stability boundary. A second illustrative example is a network of N randomly connected neurons, where each connection strength is independently drawn from a Gaussian distribution with zero mean and variance equal to g2/N. For such a network, the eigenvalues of J and J are random, but their distributions are known. The eigenvalues of J are uniformly distributed in the complex plane on a circle of radius g [41], so that the system is stable for g < 1 (Fig 3A). On the other hand, the eigenvalues of the symmetric part J are real and distributed according to the semicircle law with spectral radius [42, 43] (Fig 3B). The fact that the spectral radius of J is larger by a factor than the spectral radius of J implies that if g is in the interval the network is stable but exhibits amplified transient activity (Fig 3C). Note that the connectivity is non-normal for any value of g, but the additional condition is needed for the existence of amplified trajectories. This in particular implies that for random connectivity transient amplification requires the network to be close to instability, so that the dynamics are slowed down as pointed out in [34].

Fig 3

Dynamical regimes of a N-dimensional network model with random Gaussian connectivity structure.

Each entry of J is independently drawn from a Gaussian distribution with zero mean and variance g2/N. A. The eigenvalues of J are complex, and, in the limit of large N, distributed uniformly within a circle of radius R(J) = g in the complex plane (Girko’s law, [41]). The system is stable if g < 1. Left: g = 0.5. Right: g = 0.9. B. The eigenvalues of the symmetric part J are real-valued, and are distributed in the large N limit according to the semicircle law, with the largest eigenvalue of J given by the spectral radius [42, 43]. Since the spectral radius of J is larger than the spectral radius of J, for sufficiently large values of g some eigenvalues of J can be larger than unity (in red), while the network dynamics are stable (g < 1). C. Spectral radii of J and J as a function of the random strength g. The interval of values of g for which the system displays strong transient dynamics in response to specific inputs is given by . N = 200 in simulations.

Dynamical regimes of a N-dimensional network model with random Gaussian connectivity structure.

Coding with amplified transients

For a connectivity matrix satisfying the amplification condition λmax(J) > 1, only specific external inputs r0 are amplified by the recurrent circuitry, while others lead to monotonically decaying trajectories (Fig 4B). Which and how many orthogonal inputs are amplified? What is the resulting state of the network at the time of maximal amplification, and how can the inputs be decoded from that state?

Fig 4

Coding multiple stimuli with amplified transient trajectories.

Coding multiple stimuli with amplified transient trajectories.

Example corresponding to a N-dimensional Gaussian connectivity matrix with g = 0.9. A. Singular values of the propagator, , as a function of time (SV trajectories). Dark blue traces show the amplified singular values, defined as having positive slope at time t = 0; The dominant singular value corresponds to the dashed line. Light blue traces correspond to the non-amplified singular values, having negative slope at t = 0. B. Norm of the activity elicited by the first two amplified inputs, i.e. , , (right singular vectors corresponding to singular values and at time t* indicated by the dashed vertical line in panel A; purple and red traces), and by one non-amplified input (chosen as , corresponding to ; orange trace). C. Illustration of the dynamics elicited by the three inputs, R1, R2 and R100 (shown in different rows), as in B. Left: Activity of 10 individual units. Center: Projections of the evoked trajectories onto the plane defined by the stimulus and the corresponding readout vector (in analogy with the amplified case, we chose the readout of the non-amplified dynamics to be the state of the system at time t*, i.e. ). Right: population responses to the three stimuli projected on the readout vectors , and . N = 1000 in simulations. One approach to these questions is to examine the mapping from inputs to states at a given time t during the dynamics. Since we consider linear networks, the state reached at time t from the initial condition r0 is given by the linear mapping r(t) = P r0, where for any time t > 0, P = exp(t(J − I)) is an N × N matrix called the propagator of the network. At a given time t, the singular value decomposition (SVD) of P defines a set of singular values , and two sets of orthonormal vectors and , such that P maps onto . In other words, taking as the initial condition leads the network to the state at time t: If , the norm of the activity at time t is larger than unity, so that the initial condition is amplified. In fact, the largest singular value of P determines the maximal possible amplification at time t (see Methods). Note that for a normal matrix, the left and right singular vectors and are identical, and the singular values are equal to the modulus of the eigenvalues, so that the stability of the dynamics imply an absence of amplification. Conversely, stable amplification implies that and are not identical, so that an amplified trajectory explores at least two dimensions corresponding to the plane spanned by and . Since the propagator P depends on time, the singular vectors and , and the singular values depend on time. One can therefore look at the temporal trajectories , which by definition all start at one at t = 0 (Fig 4A). If the connectivity satisfies the condition for transient amplification, at least one singular value increases above unity, and reaches a maximum before asymptotically decreasing to zero. The number of singular values that simultaneously take values above unity (Fig 4A) defines the number of orthogonal initial conditions amplified by the dynamics. Choosing a time t* at which N of the singular value trajectories lie above unity, we can indeed identify a set of N orthogonal, amplified inputs corresponding to the right singular vectors of the propagator at time t*. According to Eq (5), each of these inputs is mapped in an amplified fashion to the corresponding left singular vector at time t*, which also form an orthogonal set. Each amplified input can therefore be decoded by projecting the network activity on the corresponding left singular vector (Fig 4C). Since are mutually orthogonal, the different initial conditions lead to independent encoding channels. Again, as the dynamics are non-normal, the inputs R and the outputs L are not identical, so that the dynamics for each amplified input are at least two-dimensional (Fig 4C). How many independent, orthogonal inputs can a network encode with amplified transients? To estimate this number, a central observation is that the slopes of the different singular value trajectories at t = 0 are given by the eigenvalues of the symmetric part of the connectivity J. This follows from the fact that the singular values of the propagator P are the square root of the eigenvalues of , and at short times δt we have . This implies that the number of singular values with positive slope at the initial time is equal to the number of eigenvalues of the symmetric part J larger than unity. To eliminate the trajectories with small initial slopes, one can further constrain the slopes to be larger than a margin ϵ, in which case the number of amplified trajectories N(ϵ) is given by the number of eigenvalues of J larger than 1 + ϵ. Note that N(ϵ) provides only a lower bound on the number of amplified inputs, as singular values with initial slope smaller than zero can increase at later times. It is straightforward to compute N(ϵ) when the connectivity matrix J is a random matrix with independent and identically distributed elements. In this case the probability distribution of the eigenvalues of its symmetric part J follows the semicircle law (Fig 3), and when the number of neurons N is large, the number N of amplified inputs scales linearly with N. To summarize, the amplified inputs and the corresponding encoding at peak amplification can be determined directly from the singular value decomposition of the propagator, given by the exponential of the connectivity matrix. For an arbitrary N × N matrix J, characterizing analytically the SVD of its exponential is in general a complex and to our knowledge open mathematical problem. For specific classes of matrices, the propagator and its SVD can however be explicitly computed, and in the following we will exploit this approach.

Implementing specific transient trajectories

The approach outlined above holds for any arbitrary connectivity matrix, and allows us to identify the external inputs which are strongly amplified by the recurrent structure, along with the modes that get most activated during the elicited transients, and therefore encode the inputs. We now turn to the converse question: how to choose the network connectivity J such that it generates a pre-determined transient trajectory. Specifically, we focus on low-rank networks, a type of model ubiqitous in neuroscience [44, 45], and set out to determine the minimal-rank connectivity that transiently transforms a fixed, arbitrary input r0 into a fixed, arbitrary output w at the time of peak amplification, through two-dimensional dynamics. To address this question, we consider a connectivity structure given by a unit-rank matrix J = Δuv [45]. Here u and v are two vectors with unitary norm and correlation ρ (〈u, v〉 = ρ), and Δ is an overall scaling parameter. We applied to this connectivity the general analysis outlined above (see Methods). The only non-zero eigenvalue of J is Δρ, and the corresponding linear system is stable for Δρ < 1. The largest eigenvalue of the symmetric part of the connectivity J is given by Δ(ρ + 1)/2, so that the network displays amplified transients if and only if Δ(ρ + 1)/2 > 1 (while Δρ < 1). Keeping the eigenvalue Δρ constant and increasing Δ will therefore lead to a transition from monotonically decaying to amplified transients (Fig 5A). If ρ = 0, the vectors u and v are orthogonal, and the condition for amplification is simply Δ > 2. Note that in this situation, amplification is obtained without slowing down the dynamics, in contrast to randomly coupled networks [34].

Fig 5

Low-dimensional amplified dynamics in random networks with unit-rank structure.

Low-dimensional amplified dynamics in random networks with unit-rank structure.

A. Dynamical regimes as a function of the structure vector correlation ρ = u ⋅ v and the scaling parameter of the connectivity matrix, Δ. Grey shaded areas correspond to parameter regions where the network activity is monotonic for all inputs; blue shaded areas indicate parameter regions where the network activity is amplified for specific inputs; for parameter values in the white area, activity is unstable. Samples of dynamics are shown in the bottom panels, for parameter values indicated by the colored dot in the phase diagram: Δ = 4 and ρ = 0. Dashed colored traces correspond to the parameter regions explored in panels B. and C., defined by the equation λ = Δρ. B. Maximum amplification of the system, quantified by σ1(P), the first singular value of the propagator, as a function of the scaling parameter Δ. Here we fix the eigenvalue of the connectivity matrix λ = Δρ associated with the eigenvector u, and vary Δ. Colored traces correspond to different choices of the eigenvalue of the connectivity λ. C. Correlation between the optimally amplified input direction and the structure vector v as a function of the parameter Δ. Increasing the non-normal parameter Δ aligns the optimally amplified input with the structure vector v. In B. and C. mean and standard deviation over 50 realizations of the connectivity matrix are shown for each trace. The elements of the structure vectors are drawn from a Gaussian distribution, so that they have on average unit norm and correlation ρ (see Methods). D. Low-dimensional dynamics in the case of two stored patterns. Input v(1) (resp. v(2)) elicits a two-dimensional trajectory which brings the activity along the other structure vector u(1) (resp. u(2)), mapping stimulus v(1) (resp. v(2)) into its transient readout u(1) (resp. u(2)). Blue and red colors correspond to the two stored patterns. E. Firing rates of 10 individual units. F. Temporal evolution of the activity norm. G. Projection of the network response evoked by the input along v(1) (resp. v(2)) on the corresponding readout u(1) (resp. u(2)). The case of unit rank connectivity (one stored pattern) reduces to the first row of panels D. −G. (where the activity on u(2) is equivalent to the activity on a readout orthogonal to u(1)). N = 3000 in simulations. For this unit rank connectivity matrix, the full propagator P = exp(t(J − I)) of the dynamics can be explicitly computed (see Methods). The non-trivial dynamics are two-dimensional, and lie in the plane spanned by the structure vectors u and v (Fig 5D), while all components orthogonal to this plane decay exponentially to zero. Determining the singular value decomposition of the propagator allows us to compute the amount of amplification of the system, as the value of σ1(P) at the time of its maximum t*. In the amplified regime (for Δ(ρ + 1)/2 > 1), the amount of amplification increases monotonically with Δ (Fig 5B). Since only one eigenvalue of J is larger than unity, only one input perturbation is able to generate amplified dynamics. For large values of Δ, this optimal input direction is strongly correlated with the structure vector v. Perturbing along the vector v elicits a two-dimensional trajectory which at its peak amplification is strongly correlated with the other structure vector u (Fig 5C). Choosing v = r0 and u = w, the unit-rank connectivity therefore directly implements a trajectory that maps the input r0 into the output w, identified as the transient readout vector for stimulus r0. Several, orthogonal trajectories can be implemented by adding orthogonal unit-rank components. For instance, taking J = Δu(1)v(1) + Δu(2)v(2), where the planes defined by the structure vectors in each term are mutually orthogonal, the input v(1) evokes a trajectory which is confined to the plane defined by u(1) and v(1), and which maps the input v(1) into the output u(1) at the time of peak amplification (Fig 5D–5G). Similarly, the input v(2) is mapped into the output u(2) during the evoked transient dynamics. Therefore, the rank-2 connectivity J implements two transient patterns, encoding the stimuli v(1) and v(2) into the readouts u(1) and u(2). A natural question is how robust the scheme is and how many patterns can be implemented in a network of fixed size N.

Robustness and capacity

To investigate the robustness of the transient coding scheme implemented with unit-rank terms, we first examined the effect of additional random components in the connectivity. Adding to each connection a random term of variance g2/N introduces fluctuations of order to the component of the activity on the plane defined by u and v (see Methods). Consequently, the projection of the trajectory on the readout w = u has fluctuations of the same order (Fig 6A–6C). A supplementary effect of random connectivity is to add to the dynamics a component orthogonal to u and v, proportional to Δ (see S8 Text), which however does not contribute to the readout along w. Thus, for large N, the randomness in the synaptic connectivity does not impair the decoding of the stimulus r0 from the activity along the corresponding readout w.

Fig 6

Robustness of the transient coding scheme and capacity of the network.

Robustness of the transient coding scheme and capacity of the network.

(A-B-C) Robustness of the readout activity for a single stored pattern u-v in presence of randomness in the connectivity with variance g2/N. A. Projection of the population activity elicited by input v along the readout u (red trace) and along a readout orthogonal to u (blue trace) for g = 0.5. The elements of the orthogonal readout are drawn from a random distribution with mean zero and variance 1/N and are fixed over trials. The projection of the activity on u is also shown for the zero noise case (g = 0; black dashed line). B. Value of the activity along u (red dots) and along the orthogonal readout (blue dots) at the peak amplification (t = t*), as a function of g. In A and B, N = 200; error bars correspond to the standard deviation over 100 realizations of the random connectivity. C. Standard deviation of the readout activity at the peak amplification as a function of the network size N for two values of g. The fluctuations are inversely proportional to the network size and scale as . Error bars correspond to the standard deviation of the mean over 100 realization of the connectivity noise. (D-E-F) Robustness of the transient coding scheme in presence of multiple stored patterns. D. Projection of the population activity elicited by one arbitrary amplified input v( along the corresponding readout u( (red trace) and along a different arbitrary readout u( (blue trace) for P/N = 0.02. The readout u( was changed for every trial. The projection of the activity on u( is also shown when only the pattern u(-v( is encoded (P = 1; black dashed line). E. Value of the activity along u( (red dots) and along the readout u( (blue dots) at the peak amplification (t = t*), as a function of P/N. In D and E N = 200; error bars correspond to the standard deviation over 100 realizations of the connectivity matrix. F. Standard deviation of the readout activity (along u() at the peak amplification as a function of the network size N for two values of P/N. The fluctuations are inversely proportional to the network size and scale as . Error bars correspond to the standard deviation of the mean over 100 realizations of the connectivity noise. The robustness of the readouts to random connectivity implies in particular that the unit-rank coding scheme is robust when an extensive number P of orthogonal transient trajectories are implemented by the connectivity J. To show this, we generalize the unit-rank approach and consider a rank-P connectivity matrix, given by the sum of P unit-rank matrices, , where each term specifies an input-output pair. We focus on the case where the elements of all the vectors u( and v( are independently drawn from a random distribution (see Methods), implying that all input-output pairs are mutually orthogonal, i.e. uncorrelated, in the limit of large N. In this situation, the interaction between the dynamics evoked by one arbitrary input v( and the additional P − 1 patterns is effectively described by a system with connectivity J = Δu( v( corrupted by a random component with zero mean and variance equal to Δ2 P/N2 (see Methods). From the previous results, it follows that the fluctuations of the activity of the readout u( are of order (Fig 6D–6F). Thus, in high dimension, the readout activity is robust to the interactions between multiple encoded trajectories. When the number of encoded trajectories is extensive (P = O(N)), each stimulus (v(, can therefore still be decoded from the projection of the activity on the corresponding readout u(. A natural upper bound on the number of trajectories that can be implemented by the connectivity J is derived from the stability constraints of the linear system. Indeed, the largest eigenvalues of J is given by and it needs to be smaller than one for stability. Thus, the maximum number of trajectories that can be encoded in the connectivity J is given by Pmax = N/Δ2 and defines the capacity of the network. Crucially, the capacity scales linearly with the size of the network N. The capacity also decreases for highly amplified systems, resulting in a trade-off between the separability of the neural activity evoked by different stimuli (quantified by Δ) and the number of stimuli that can be encoded in the connectivity (quantified by Pmax).

Discussion

We examined the conditions under which linear recurrent networks can implement an encoding of stimuli in terms of amplified transient trajectories. The fundamental mechanism underlying amplified transients relies on the non-normal properties of the connectivity matrix, i.e. the fact that the left- and right-eigenvectors of the connectivity matrix are not identical [37]. A number of recent studies in theoretical neuroscience have pointed out the interesting dynamical properties of networks with non-normal connectivity [28, 31–35, 46, 47]. Several of these works [28, 32, 34, 35] have examined the amplification of the norm of the activity vector, as we do here. However, it was not pointed out that the presence of amplification can be diagnosed by considering the eigenvalues of the symmetric part J of the connectivity matrix (rather than examining properties of the eigenvectors of the connectivity matrix J), leading to the distinction of two classes of recurrent networks. This general criterion appears to be well-known in the theory of linear systems (Theorem 17.1 in [37]). Here we applied it to standard models of recurrent networks used in computational neuroscience, and in particular to low-rank networks [45]. We have shown that the largest eigenvalue of the symmetric part of the connectivity defines the amplification properties of the system asymptotically at small times t. Yet, it does not provide a direct measure of the maximum amplification of the norm ∥r(t)∥ over all times t (see Maximum amplification of the system). The maximum amplification can be derived using the singular value decomposition (SVD) of the propagator P, which however can be computed analytically only for simple connectivity matrices. To quantify the maximum amplification other measures have been developed that rely on the so-called pseudospectra [37] of the connectivity matrix, a generalization of the eigenvalue spectra useful for the study of non-normal dynamics. While the spectrum of the symmetric part of the connectivity controls the amplification of the system at small times and the eigenvalue spectrum determines its large time dynamics, the transient dynamics at intermediate times is largely determined by the properties of the pseudospectra of the connectivity J (Chapter 4 of [37]). Notably, the result known as the Kreiss Matrix Theorem (Eq. 14.8 and 14.12 in [37]) provides a lower and upper bound for the maximum amplification of ∥r(t)∥ based on the pseudospectrum of the connectivity J. Applying the criterion for transient amplification to classical randomly connected networks, we found that amplification occurs only in a narrow parameter region close to the instability, where the dynamics substantially slow down as previously shown [34]. To circumvent this issue, and produce strong transient amplification away from the instability, [28] introduced stability-optimized circuits (SOCs) in which inhibition is fine-tuned to closely balance excitation, and demonstrated that such dynamics can account for the experimental data recorded in the motor cortex [24]. We showed here that low-rank networks can achieve the same purpose, and exhibit strong, fast amplification in a large parameter region away from the instability. One difference with SOCs is that low-rank networks explicitly implement low-dimensional dynamics that transform a specified initial state into a specified, orthogonal output state. Several low-rank channels could be combined to reproduce higher-dimensional dynamics similar to those observed during the generation of complex movements [24]. In our framework we modeled the external stimulus as the initial condition of the network, and the amplified dynamics is autonomously generated by the recurrent interactions. Although this might appear as an oversimplifying assumption, it has nevertheless been proven useful to describe the transient population activity in motor and sensory areas. In motor and pre-motor cortex, the initial condition of the population dynamics during the execution of the movement may be set by the phase of preparatory neural activity that precedes the movement, and may determine to a large extent the time course of the movement-related dynamics [22-24]. A similar mechanism has been recently proposed to underlie the generation and population coding properties of strong sensory responses following stimulus offsets in auditory cortex. Here different auditory stimuli result in largely orthogonal initial conditions at the stimulus offset, thus generating orthogonal population offset responses across stimuli [48]. The assumption of autonomous dynamics does not hold when naturalistic (e.g. temporally structured) stimuli are considered [49]. Understanding how more complex external inputs are transformed by the non-normal amplified network dynamics constitutes a major direction of future work. The study by Murphy and Miller [32] reported that the excitatory-inhibitory (EI) structure of cortical networks induces non-normal amplification between so-called sum and difference E-I modes. Interestingly, the specific networks they considered are of the low-rank type, with sum and difference modes corresponding to left- and right- vectors of the individual unit-rank terms [35]. This connectivity structure is therefore a particular instance of the low-rank implementation of amplified trajectories that we described here. Moreover, Murphy and Miller specifically focused on the inhibition-dominated regime [50], which as we show approximately corresponds to the class of unit-rank E-I networks that exhibit strong transient amplification (Fig 2 and Supp Info; note that these networks can exhibit amplification also for 0 < k ≤ 1, in a parameter region limited by the stability boundary). In the present study, we have not enforced a separation between excitatory and inhibitory neurons, but this can be done in a straightforward way by adding a unit-rank term in which all excitatory (resp. inhibitory) connections have the same weight, and these weights are chosen strong enough to make all excitatory (resp. inhibitory) synapses positive (resp. negative). This additional component would induce one more amplified channel that would correspond to the global E-I difference mode of Murphy and Miller. Here our aim was to produce amplified, but not necessarily long-lasting transients. The timescale of the transients generated using the unit-rank implementation is in fact determined by the effective timescale of the network, set by the dominant eigenvalue of the connectivity matrix. As shown in previous studies that focused on implementing transient memory traces [31, 33, 46], longer transients can be obtained either by increasing recurrent feedback (i.e. the overlap between vectors in the unit-rank implementation), or by creating longer hidden feed-forward chains. For instance, an effective feed-forward chain of length k can be obtained from a rank k connectivity term of the type J = Δv(v( +…+ Δv(3)v(2) + Δv(2)v(1), i.e. in which each term feeds into the next one [51]. This leads in general to a k + 1-dimensional transient with a timescale extended by a factor k [33]. Implementing this kind of higher-dimensional transients naturally comes at the cost of reducing the corresponding capacity of the network. The implementation of transient channels proposed here clearly bears a strong analogy with Hopfield networks [44]. The aim of Hopfield networks is to store patterns of activity in memory as fixed points of the dynamics, and this is achieved by adding to the connectivity matrix a unit-rank term for each pattern . One key difference with the present network is that Hopfield networks rely on symmetric connectivity [52], while amplified transients are obtained by using strongly asymmetric terms in which the left- and right-vectors are possibly orthogonal. Another difference is that Hopfield networks rely on a non-linearity to generate fixed points for each pattern, while here we considered instead linear dynamics in the vicinity of a single fixed-point. The non-linearity of Hopfield networks endows them with error-correcting properties, in the sense that a noisy initial condition will always lead to the activation of a single memorized pattern. A weaker form of error-correction is also present in our linear, transient encoding, since any component along non-amplified directions will decay faster than the amplified pattern. However, if two amplified patterns are simultaneously activated, they will lead to the activation of both corresponding outputs. This absence of competition may not be undesirable, as it can allow for the simultaneous encoding, and possibly binding, of several complementary stimulus features. The amplified dynamics map specific external inputs onto orthogonal patterns of activity with larger values of the norm ∥r(t)∥. These dynamics however, along with amplifying the amplitude of the signal, also amplify the external noise that is injected in the network. This external noise is maximally amplified along the readout dimensions corresponding to the amplified inputs (Eq (83)), implying that the signal-to-noise ratio at the peak amplification is comparable to the SNR at the initial state. Therefore, transient amplification may not favor stimulus decoding during the transient state with respect to the initial state in presence of noisy input, but may nonetheless be needed to keep a stable value of the SNR across the transient dynamics (see Robustness of the readout activity). Instead, the amplification of the norm constitutes an advantage when the synaptic connections to the readout neurons are themselves corrupted by noise. As the noise in the readout weights, or observational noise, is not directly fed into the network, it does not get amplified by the recurrent interactions. As a result, the detrimental effects of observational noise are overcome by amplifying the signal r(t) above the noise level, which can be directly implemented by the transient coding scheme illustrated here. Transient amplification of external inputs may therefore result in an increased ability to robustly decode the external input in presence of noisy readout synapses. While we focused here on linear dynamics in the vicinity of a fixed point, strong non-linearities can give rise to different transient phenomena [12]. In particular, one prominent proposal is that robust transient coding can be implemented using stable heteroclinic channels, i.e. sequences of saddle points that feed into each other [4]. This mechanism has been exploited in specific models based on clustered networks [5]. A general theory for this type of transient coding is to our knowledge currently lacking, and constitutes an interesting avenue for future work.

Methods

The network model

We study a recurrent network of N randomly coupled rate units. Each unit i is described by the time-dependent variable r(t), representing its firing rate at time t. The transfer function of the individual units is linear, so that the equation governing the temporal dynamics of the network reads: where τ represents the membrane time constant (fixed to unity), and J is the effective synaptic strength from neuron j to neuron i. In absence of external input, the system has only one fixed point corresponding to r = 0 for all i. To have stable dynamics, we require that the eigenvalues of the connectivity matrix J be smaller than unity, i.e. . We write the external input as the product between a common time-varying component I(t), and a term r0, which corresponds to the relative activation of each unit. The terms r0, can be arranged in a N-dimensional vector r0, which we call the external input direction. Here we focus on very short external input durations (I(t) = δ(t)) and on input directions of unit norm (∥r0∥ = 1). This type of input is equivalent to setting the initial condition to r(0) = r0. Since we study a linear system, varying the norm of the input direction would result in a linear scaling of the dynamics.

Dynamics of the network

We first outline the standard approach to the dynamics of the linear network defined by Eq (6) (see e.g. [36, 53]). The solution of the differential equation given by Eq (6) can be obtained by diagonalizing the linear system, i.e. by using a change of basis such that the connectivity matrix in the new basis Λ = V−1 JV is diagonal. The matrix V contains the eigenvectors v1, v2, …, v of the connectivity J as columns, while Λ has the corresponding eigenvalues λ on the diagonal. Therefore the variables represent the components of the rate vector on the basis of eigenvectors of J. In this new basis the system of coupled equations in Eq (6) reduces to the set of uncoupled equations The dynamics of the linear network given by Eq (6) can thus be written in terms of its components on the eigenvectors v as Equivalently, the solution of the linear system can be expressed as the product between a linear, time-dependent operator P and the initial condition r0 [54]: The linear operator P is called the propagator of the system and it is defined as the matrix exponential of the connectivity matrix J, i.e. P = exp(t(J − I)/τ). By using the definition of matrix exponential in terms of power series, we can express the propagator as . From Eq (9) we note that the propagator P at time t defines a mapping from the state of the system at time t = 0, i.e. the external input direction r0, to the state r(t).

Dynamics of the norm

To study the amplification properties of the network, we follow [39] and focus on the temporal dynamics of the population activity norm ∥r(t)∥ [28]. The equation governing the dynamics of the norm can be derived by writing , so that the relative rate of change of the norm is given by [39] By using Eq (6) we can write the right hand side of the previous equation as where we introduced J = (J + J)/2, the symmetric part of the connectivity matrix J. Both the eigenvalues and the eigenvectors of J provide information on the transient dynamics of the system. On one hand, we show in the main text that the activity norm can have non-monotonic behaviour if and only if at least one eigenvalue of the matrix J is larger than one. Therefore the eigenvalues of J determine the type of transient regime of the system. On the other hand, as J is symmetric, its set of eigenvectors is orthogonal and provides a useful orthonormal basis onto which we can project the dynamics. In this basis, the connectivity matrix is given by , where V contains the eigenvectors of J as columns. The matrix J can be uniquely decomposed as J = J + J, where J = (J − J)/2 is the anti-symmetric part of J, so that The first term on the right hand side is a diagonal matrix, while the second term is an anti-symmetric matrix. Since the latter has zero diagonal elements, the new connectivity matrix J′ displays the eigenvalues of J on the diagonal. The off-diagonal terms of J′ are given by the elements of and represent the strength of the couplings between the eigenvectors of J. In the amplified regime, some of the eigenvalues of J are larger than one, so that without the coupling between the modes of J, the connectivity J′ would be unstable. However, in our case J and J′ are stable matrices, meaning that the coupling terms ensure the stability of the overall system. Moreover, varying the strengths of the coupling terms while keeping fixed the diagonal terms affects in a non-trivial way the maximum amplification of the system. Therefore, the decomposition in Eq (12) allows us to identify the set of key parameters that controls the maximum amplification of a specific system. In the following, we will systematically use this decomposition to analyze specific classes of matrices.

Amplification

To identify which inputs are amplified, we examine the dynamics of the activity norm ∥r(t)∥ for an arbitrary external input r0. The one-dimensional Eq (11) alone is not enough to determine the time course of ∥r(t)∥, since the right hand side depends on the solution of the N—dimensional system Eq (6). Therefore, for a specific input r0, we can use Eq (9) and write the norm of the elicited trajectory as

Input-output mapping between amplified inputs and readouts

The dynamics elicited in response to an input along an arbitrary direction is in general complex. However, the singular value decomposition (SVD) of the propagator provides a useful way to understand the network dynamics during the transient phase. Any matrix A can be written as where the matrix Σ contains the singular values σ(A) on the diagonal, while the columns of L (resp. R) are the left (resp. right) singular vectors of A, i.e. the eigenvectors of AA (resp. A A). The matrices R and L are unitary, meaning that they separately provide two orthogonal sets of unitary vectors. Thus, we can write the SVD of the propagator as From Eq (15) we see that, at a given time t, the propagator P maps each right singular vector into the left singular vector , scaled by the singular value (see Eq (5)). Note that for normal systems the singular value decomposition and the eigen-decomposition coincide. In this case the matrices L and R both contain the eigenvectors of P as columns, so that and lie on a single dimension. Instead, for a non-normal system the right and left singular vectors do not align along one direction, and the dynamics of the system in response to an input along spans at least the two dimensions defined by the two vectors and . The vectors for which correspond to the amplified inputs at time t, while the outputs are the corresponding readouts at time t.

Number of amplified inputs

The number of amplified inputs at time t is given by the number of singular values larger than unity. To estimate this number, we examine the temporal dynamics of the singular values in time (SV trajectories). We observe that, for a system in the amplified regime (λmax(J) > 1), at least one of the SV trajectories has non-monotonic dynamics, starting from one at t = 0 and then increasing before decaying to zero. In fact, the singular values of the propagator at small times t = δt are defined as the square roots of the eigenvalues of From Eq (16) we can compute the singular values of P as so that the slope at time t = 0 of the k-th singular value of the propagator is Eq (18) shows that the number of singular values larger than unity at small times is given by the number of the eigenvalues of J larger than unity, which we denote as N.

Maximum amplification of the system

From Eq (15) we see that the maximum over initial conditions of the amplification at time t corresponds to the dominant singular value of the propagator, . The associated amplified input and corresponding readout are respectively and . To obtain the maximum amplification of the system over inputs and over time, we need to compute the time t* at which attains its maximum value. Therefore, the value quantifies the maximum amplification over inputs and over time, while and correspond respectively to the most amplified input direction and the associated readout. Interestingly, it can be shown that the input satisfies the equation (see S1 Text) which depends only on the symmetric part of the connectivity matrix J. We will exploit this equation to identify the amplified initial condition in specific cases. Note that, except for N = 2, Eq (19) does not fully specify the maximally amplified input.

Characterizing transient dynamics—Summary

Summarizing, our approach for characterizing its transient dynamics can be divided into three main steps: Compute J, along with its eigenvalues and eigenvectors. Compute the propagator of the system P. Compute the Singular Value Decomposition (SVD) of the propagator. These three steps can be in principle performed numerically for any connectivity matrix. For particular classes of connectivity matrices, we show below that some or all three steps are analytically tractable.

Random Gaussian network

Here we consider a non-normal random connectivity matrix with synaptic strength independently drawn from a Gaussian distribution The eigenvalues of J are complex and uniformly distributed in a circle of radius g [41]: For this class of matrices, we can analytically determine the condition for amplified transients, and estimate the number of amplified inputs. In the stable regime (g < 1), the symmetric part of the connectivity J can have unstable eigenvalues. In fact, the elements of the symmetric part are distributed according to From random matrix theory we know that the eigenvalues of the matrix given by Eq (22) are real and distributed according to the semicircle law [42, 43]: In particular, the spectral radius of J is , meaning that J has unstable eigenvalues if . To estimate the number of amplified initial conditions, we compute the lower bound on their number N(ϵ), i.e. the number of eigenvalues of J larger than 1 + ϵ: The number of eigenvalues of J is maximum when g is close to (but smaller than) unity. In this case Eq (24) at the first order in ϵ translates to Therefore, the maximal capacity of a randomly-connected network is therefore around 10%. Computing the SVD of the exponential of a N-dimensional random matrix is to our knowledge an open mathematical problem. Therefore, for an arbitrary random connectivity matrix, the maximal amount of amplification and the amplified initial conditions are accessible only by numerically computing the SVD of exp(t(J − I)).

Two-dimensional system

In this section we consider connectivity matrices describing networks composed of two interacting units of the form The eigenvalues of J determine the stability of the network and can be expressed in terms of its trace and determinant as follows: For the dynamics to be stable, the largest eigenvalue of J needs to satisfy , equivalent to the requirement that Tr(J − I) < 0 and Det(J − I) > 0. Note that if the two eigenvalues λ± are real, they are symmetrically centered around Tr(J)/2 on the real axis; if they are complex conjugates they have real part equal to Tr(J)/2 and are symmetrically arranged on either side of the real axis.

Eigenvalues and eigenvectors of J

The condition for transient amplification is determined by the two eigenvalues of J, which read: where we introduced the parameter Δ represents the difference between the off-diagonal elements of J, and provides a measure of how far from symmetric the connectivity matrix is (Δ = 0 meaning symmetric connectivity). Note that the equation for the eigenvalues of J (Eq 28) differs from the one for the eigenvalues of J (Eq 27) by the additive term 4Δ2 under the square root. Under the assumption of a stable connectivity J, there exists a critical value for Δ, given by: above which the rightmost eigenvalue of J is larger than one, meaning that specific inputs are transiently amplified. Note that for a stable J, we have Det(J − I) > 0, implying that Δ is real. Thus, Δ is the crucial parameter which determines the dynamical regime of the system.

Decomposition on the modes of J

To identify the parameters which determine the maximum amplification of a system, we project the network dynamics onto the orthonormal basis of eigenvectors of J. In the new basis the connectivity matrix is given by Eq (12). Interestingly, the non-normal parameter Δ directly appears in the expression of the anti-symmetric part J, so that we obtain up to a sign of the off-diagonal elements. From Eq (31) we see that the non-normal parameter Δ, which determines the dynamical regime of the system, also represents the strength of the coupling between the modes of J. For Δ > Δ we have . Thus, at small times, any component of the dynamics on the first mode of J is amplified by an amount proportional to . However, at later times, because of the recurrent feedback of strength Δ between the modes of J, the system reaches a finite amount of amplification and relaxes back to the zero fixed point. In the following we examine how the value of Δ determines the amount of amplification of the system.

Propagator of the dynamics

To examine the dependence of the maximum amplification of the system on the parameter Δ we compute the propagator P and its SVD. A convenient method to compute the exponential of a matrix is provided in [55] (see S2 Text), which we apply to J′ to obtain where the time-dependent functions x0(t) and x1(t) are given by Here λ+ and λ− are the eigenvalues of J (Eq 27).

SVD of the propagator

In order to compute the maximum amplification of the system we next compute the largest singular value of the propagator σ1(P) (see S3 Text): where Here we compute the maximal amount of amplification by evaluating the maximum value in time of the amplification envelope σ1(P) (Eq 34), and examine its dependence on the non-normal parameter Δ. In particular we find that, for large values of Δ, this dependence is linear. To derive this relationship, we note that the combination depends on Δ, while does not. Therefore in Eq (35) only the functions H(t) and F(t) depend on Δ. In the amplified regime Δ ≫ Δ, we have that H(t) ≫ E(t) for times t ≫ 1/Δ (while for small times δ ≪ 1/Δ we have E(δt) = 1 + Tr(J)δt/2 ≫ Δδt = H(δt)). In addition, for large values of Δ, we can write so that the singular value can be written as To find the value of the maximum amplification we need to compute the time t* of occurrence of the global maximum of σ1(P) and the value σ1(P). The final result is given by The two-dimensional model given by Eq (26) has four free parameters, namely the strengths of the four recurrent connections. In our analysis we fix the values of the trace Tr(J) and determinant Det(J) of the connectivity matrix, so that the dynamics are stable, and vary the parameter Δ. This implies fixing the eigenvalues λ± and the corresponding timescales . This approach allows us to explore how different degrees of symmetry in the connectivity, as quantified by Δ, influence the dynamics while keeping the timescales constant. Thus, we find that, for Δ ≫ Δ, and for fixed λ±, the maximum amplification of the system scales linearly with the non-normal parameter Δ.

Optimally amplified initial condition

Here we compute the optimal input direction by solving Eq (19). We parametrize the optimal input by the angle θ* it forms with the first mode of J, i.e. . Thus, Eq (19) translates into which is satisfied by

Rank-1 connectivity

In this section we consider a unit-rank connectivity matrix defined by where the vectors u and v are two N-dimensional vectors generated as where the vectors x1, x2 and y are N-dimensional vectors with components drawn from a Gaussian distribution with mean zero and variance 1/N and ρ is a number between −1 and 1 [45]. The average norm and correlation are given by 〈u ⋅ u〉 = 〈v ⋅ v〉 = 1 and 〈u ⋅ v〉 = ρ, and Δ is an overall scaling parameter. We consider only positive values of Δ, since a minus sign can be absorbed in the correlation coefficient ρ. The matrix J has N − 1 eigenvalues equal to zero and one eigenvalue given by λ = Δρ, associated with the eigenvector u. In the two-dimensional plane spanned by u and v, the direction orthogonal to v specifies another eigenvector of J corresponding to one of the zero eigenvalues. We first compute the eigenvalues and eigenvectors of the symmetric part of the connectivity J is a rank-2 matrix, meaning it has in general two non-zero eigenvalues given by Here denotes the determinant of J restricted to the uv-plane, i.e. the determinant of the 2 × 2 matrix [u, u⊥] J[u, u⊥], where u⊥ is a vector perpendicular to u on the uv-plane (the determinant of the full matrix J is zero because of the zero eigenvalues of J). We find that the two non-zero eigenvalues of the symmetric part J are given by (see S4 Text) Note that the eigenvalues of J are symmetrically centered around λ/2, and their displacement is controlled by the scaling parameter Δ. The condition for the system to be in the regime of transient amplification is therefore To compute the eigenvectors associated with the non-zero eigenvalues we have to solve the eigenvector equation Since the two eigenvectors lie on the uv-plane, we can write them in the form and . Solving the eigenvector equation for α and β yields α = 1 and β = −1. The two normalized eigenvectors of J are thus given by We can project the dynamics of the system on the basis of eigenvectors of J. Let V be the N-dimensional matrix containing the eigenvectors of J as columns: where the ’s are N − 2 arbitrary vectors orthogonal to both u and v. The projection of the connectivity matrix J onto the modes of J yields the new connectivity J′: From Eq (49) we see that the parameter Δ controls the strength of the coupling between the modes of J through the term . Thus, in the following analysis, we examine the amplification properties of the system as a function of the parameter Δ. We explicitly compute the expression of the propagator for the unit-rank system. From the definition of matrix exponential in terms of infinite sum of matrix powers we obtain Therefore the final expression for the propagator is given by where we introduced Note that the non-trivial dynamics of the system are restricted to the plane spanned by u and v. In fact any component of the initial condition orthogonal to this plane decays to zero as e−, as any component orthogonal to v in the uv-plane. From this it follows that non-monotonic transients occur only if the initial condition of the system has a non-zero component on the structure vector v. To study how the maximum amplification depends on Δ we compute the amplification envelope σ1(P). The singular values of the propagator P are given by the square roots of the eigenvalues of the matrix . From Eq (51) we can write We obtain the expression for the singular values of the propagator σ1,2(P) as a function of Δ and λ (see S5 Text): The other N − 2 singular values of P are equal to e−.

Choice of the free parameters

For the unit-rank system, two parameters out of Δ, λ and ρ can vary independently. Since we set Δ as a free parameter, we need to fix the second independent parameter. We explore three scenarios, which imply different scalings of λ or ρ with the parameter Δ: keep the eigenvalue λ constant, so as to fix the timescale τ = 1/(1 − λ), and vary Δ. In this case the correlation ρ between the u and v scales according to ρ = λ/Δ, meaning that increasing Δ makes the structure vectors more orthogonal to each other. Fix the correlation between the structure vectors, ρ, to a positive value and vary Δ. Increasing Δ has the effect to increase the timescale of the system τ = 1/(1 − Δρ), until a point where the system becomes unstable, i.e. for λ > 1, or equivalently Δ > 1/ρ. Keep ρ fixed to a negative value. In this case Δ can be increased without bounds and higher values of Δ decrease the timescale τ. The singular values of the propagator given by Eq (54) depend in a complex manner on Δ and λ. To understand how the maximum amplification of the system depends on Δ, we study the limit of very large Δ, defined as which we call the strong amplification regime. Note that in general the eigenvalue λ depends on Δ, according to λ(Δ) = Δρ. For fixed λ, Eq (55) is given by , while for a fixed value of ρ, Eq (55) translates into Δ ≫ 2(1−ρ) (with the additional constraint Δ < 1/ρ ensuring stability, in case ρ > 0). If condition given by Eq (55) is met, we can approximate Eq (54) for times t ≫ 2/Δ as For large Δ we can neglect the first two terms on the right hand side and write the largest singular value as The maximum amplification of the system corresponds to the maximum value in time of σ1(P). In the strong amplification regime (Eq 55) the time t* at which the singular value attains its maximum is independent of Δ and reads: Thus, the maximum amplification increases monotonically with Δ: where g(λ) is a multiplicative factor which depends on the eigenvalue λ. Different choices of the free parameters imply different growths of the maximum amplification with Δ: for λ fixed and ρ = λ/Δ, the maximum amplification increases linearly with Δ. For ρ > 0 fixed and λ = Δρ, the maximum amplification increases monotonically with Δ, until it reaches a value equal to Δ for Δ = 1/ρ (or λ = 1). For ρ < 0 fixed and λ = Δρ, the amplification increases monotonically with Δ, but it saturates at a value given by 1/|ρ|. This follows from the fact that In the case ρ = 0 the maximum amplification grows linearly as Δ/e, since The general observation that the largest eigenvalue of the symmetric part of J does not provide direct information about the maximum amount of amplification that the system can reach (maximum value of ∥r(t)∥ over time) is illustrated by the third case. Here ρ < 0 and the largest eigenvalue of the symmetric part of the connectivity is given by λmax(J) = Δ(1 + ρ)/2. Therefore, while λmax(J) can grow indefinitely by increasing the value of Δ, the maximum amplification saturates at a positive level given by 1/|ρ|. This indicates that the value of the largest eigenvalue of J is in general inadequate to characterize the maximum amplification of the system, for which other measures may be considered (see Discussion).

Optimally amplified initial condition and optimal readout

Using the result we found for the two dimensional case, Eqs (40) and (44), we can determine the angles and of the optimal initial condition and optimal readout with respect to the first mode of J as where the + and − signs correspond respectively to ans . The optimally amplified initial condition and optimal readout are thus given by Here we examine and in the strong amplification regime (Eq 55). We summarize our results as follows. For fixed λ and ρ = λ/Δ, we have up to the first order in Δ−1. In the strong amplification regime the second term on the right hand side is much smaller than unity, so that we can compute and at the first order in Δ−1. Denoting by and respectively the vectors orthogonal to v and u in the uv-plane, we can write In the strong amplification regime the optimal initial condition is thus strongly aligned with v and the optimal readout with the vector u. For fixed ρ > 0 and λ = Δρ, we compute the value of tanθ* for the largest value Δ can take before the system becomes unstable, i.e. Δ = 1/ρ. For this value we have Thus we have For fixed ρ < 0 and λ = Δρ, we can write so that In conclusion we find that, in the strong amplification regime, the optimal input has a strong component on the structure vector v, while the optimal readout is strongly aligned with u. In cases (2) and (3), however, this requires the additional condition that ρ be small.

Robustness of the readout to noise in the connectivity

In this section we study the dynamics of the system in presence of noise in the synaptic connectivity. We consider the connectivity matrix given by Eq (41), which implements a single transient pattern, and we add uncorrelated noise of standard deviation to each weight Δu v. The resulting connectivity matrix can be written as the sum of a structured unit-rank part and a Gaussian random matrix of the form [35] The elements of are independently drawn from a Gaussian distribution with zero mean and variance 1/N and are uncorrelated with the structured part. In the limit of large N, the matrix J has one eigenvalue equal to the eigenvalue of the unit-rank part, λ = Δρ, while the other N − 1 eigenvalues are uniformly distributed in a circle of radius g. This holds under the condition that the operator norm of the unit-rank part max∥Δuv x∥ is O(1) [56]. Since the structure vectors u and v have unit norm, the operator norm of the unit-rank part is equal to Δ. Therefore, if Δ is O(1), the condition for the stability of the system is max{λ, g} < 1.

Eigenvalues of J

To draw the phase diagram of the system, we compute the eigenvalues of the symmetric part of J where denotes the symmetric part of . The entries of are distributed according to We can express the eigenvalues of J as a function of g and of the eigenvalues of the symmetric part of the unit-rank matrix (see Eq 44) [57, 58]. In particular, the rightmost eigenvalue of J is given by where corresponds to the spectral radius of . We distinguish two cases: if , is larger than one only if the two conditions are satisfied. The first inequality is satisfied if or . Since for we have , the condition for the amplified regime becomes If , the inequality (λ + Δ)/2 + g2/(λ + Δ) > 1 is always satisfied for λ + Δ > 0, thus holding also for . From Eq (73) we conclude that, for , λmax(J) is larger than one independently of the values of λ and Δ. In the case , adding noise in the connectivity has a small effect on the phase diagram of the system. In fact, Eq (75) can be approximated as λ + Δ ≳ 2 − g2, which leads to a correction of order g2 to the condition for the amplified regime in absence of noise (see S1 Fig).

Robustness of the readout activity

Here we examine the magnitude of the fluctuations around the mean activity introduced by the random term in the connectivity given by Eq (70). In particular we assess the robustness of the readout projection of the response evoked by the optimal stimulus of the noiseless system, i.e. g = 0 (for a discussion on the effects of the connectivity noise on the activity orthogonal to the uv-plane see S8 Text). For simplicity, we assume that the correlation between the structure vectors, ρ, is close to zero, and that the condition for the strong amplification regime is satisfied (Eq (55)). Therefore, the optimal stimulus is strongly aligned with v, while the corresponding readout is u. We consider the system Each neuron receives independent noise with mean zero, variance σ2 and autocorrelation function 〈η(t)η(t′)〉 = δ δ(t − t′), where the angular brackets represent the average over the noise in the input and in the connectivity. In the limit of large N, the equation for the mean activity depends only on the structured part of the connectivity: Thus, the mean activity in response to an input along v is given by (see Eq 51) From Eq (77) we write the equation for the fluctuations of r(t) around the mean activity, δr(t) = r(t) − 〈r(t)〉, as where in the third term on the right hand side we neglected the corrections to r(t) due to the random component of the connectivity and input noise, keeping only the 0-th order term in g, i.e. 〈r(t)〉. Using Eq (78) we can write the solution of Eq (79) as The time-dependent correlation matrix C(t) = 〈δ r(t)δ r(t)〉 can be written as the sum of two terms, corresponding to the contributions of the noise in the connectivity (with variance proportional to g2) and the noise in the input (with variance σ2): where in the first term in the right hand side we used 〈χ χ〉 = δ δ/N. We start by computing the first term in Eq (81). Since the elements of the matrix propagator and the mean activity are known (see Eqs 51 and 78), we can compute for a given realization of the structured part (see S7 Text). The variance of the activity along the direction of the readout u due to the noise in the connectivity is computed by projecting the matrix C onto u. In particular we compute the variance of δr and at the peak of the transient phase (t* ≃ 1, see Eq (58)). As a result, the fluctuations of the readout activity at t = t* due to the noise in the connectivity read: and scale as (for large Δ). Computing the variance of the activity along the readout u due to the input noise yields (see S7 Text) From Eq (81), we can write the total amount of variability along the readout u at the peak amplification as Note that the fluctuations along u due to the noise in the input do not depend on the size of the network N. Therefore, in the limit of large N, only the input noise affects the readout activity significantly. By computing the signal-to-noise ratio (SNR) of the readout activity, we can assess the reliability of the readout in presence of input noise. The signal of the readout is simply the amplification level at the peak of the transient phase. Since for orthogonal structure vectors (ρ ≃ 0) the amplification grows as Δ/e, we find The readout is reliable if its signal-to-noise ratio is much larger than unity. Interestingly, for large values of Δ (see Eq 55), the SNR is independent of Δ, so that increasing the amplification does not improve the SNR significantly (see S2 Fig). In fact, for Δ ≫ 2, we can approximate Eq (85) as In this regime, the critical value of σ above which the SNR becomes smaller than unity is: We observe that the signal-to-noise ratio along the initial state (the vector v) is simply given by SNR0(σ) = 1/σ, so that the critical value of σ above which SNR0 becomes smaller than unity is given by σ = 1. As a result, the maximum gain in SNR that strongly amplified networks can achieve is less than 20%. The signal-to-noise ratio along the transient readout u and along the initial state v are therefore comparable. However, since SNR(σ;Δ)

Robustness to multiple stored patterns and capacity of the network

In this section we examine the robustness of the transient readouts when P transient trajectories are encoded in the connectivity J. We consider a connectivity matrix given by the sum of P unit-rank matrices where the elements of the vectors u( and v( are randomly and independently distributed with zero mean and variance equal to 1/N. Therefore, for large N and for P ≤ N/2, these vectors are close to orthogonal to each other, meaning that the correlation between all the pairs of structure vectors, ρ, is close to zero. For simplicity, we assume that the non-normal parameter Δ is the same for all stored trajectories. We first study the case of two stored transient trajectories (P = 2), then generalizing to an extensive number of patterns P = O(N).

Two encoded transient trajectories

The connectivity matrix in this case is given by Since the four structure vectors in Eq (89) are uncorrelated with each other, in the limit of large N, we can factorize the full propagator of the dynamics as the product of the propagators of the single unit-rank parts (see Eq (51)) and obtain (see S9 Text) where α(t; λ = 0) = t (see Eq 52). From Eq (90) we see that, in high dimensionality, the two transient patterns do not interact. In fact, any initial condition defined on the plane spanned by u(1) and v(1) evokes a two-dimensional trajectory which remains confined on the same plane. The same holds for the dynamics on the plane defined by u(2) and v(2).

Extensive number of encoded trajectories and capacity of the network

When the number of encoded trajectories P is of order N, we cannot factorize the propagator as in the case of two stored patterns, due to the stronger correlations between the 2P structure vectors u( and v(. However, the results for the case of one stored pattern with connectivity noise can be applied to this case if we write the connectivity matrix in Eq (88) as Here we isolate the first term of the sum but, since all the P patterns are statistically equivalent, the choice of the first pattern is arbitrary. The vectors u( and v( are uncorrelated with each other, so that we can consider the second term on the right hand side of Eq (91) effectively as noise in the connectivity J = Δu(1)v(1), with mean zero and variance Δ2 P/N2. In fact, the mean and the variance of the effective noise are given respectively by and Applying the results from the previous sections with , we can state that the noise coming from the additional P−1 patterns adds fluctuations of the order to the projection of the activity on the readout u(1) corresponding to the stimulus v(1). Since the number of encoded patterns P is extensive, the readout fluctuations scale as . However, when a number P of trajectories are encoded in J, we are not guaranteed that the connectivity has stable eigenvalues. Indeed, the eigenvalues of the matrix are distributed in a circle of radius (yet the spectral density is not uniform, since Eq (88) can be written as the product of two rectangular Gaussian matrices) [59]. Thus, to ensure overall stability we need , resulting in a maximal number of patterns Pmax that can be stored in the connectivity before the system becomes unstable. This number defines the capacity of the system and is given by If the vectors u( and v( are exactly orthogonal to each other for all p, therefore forming an orthonormal basis in , Eq (94) reduces to Pmax = N/2. From Eq (94) we see that, for fixed Δ, the number of transient trajectories that we can encode in the connectivity matrix scales linearly with the size of the system, N. The capacity of the system rapidly drops when Δ is increased, meaning that more amplified systems can encode less number of stimuli. When the structure vectors are orthogonal to each other as in our case (ρ ≃ 0), the system is amplified for Δ > 2 (see Eq 45). Therefore, Eq (94) evaluated at Δ = 2 provides an upper bound on the capacity for an amplified system with uncorrelated structure vectors: (PDF) Click here for additional data file. (PDF) Click here for additional data file. (PDF) Click here for additional data file. (PDF) Click here for additional data file. (PDF) Click here for additional data file. (PDF) Click here for additional data file. (PDF) Click here for additional data file. (PDF) Click here for additional data file. (PDF) Click here for additional data file.

Phase diagram for the unit-rank network with connectivity noise.

A. . The red line indicates the boundary between the monotonic and amplified parameter regions for g = 0.5. The grey dashed line corresponds to the case g = 0. B. . The dynamics are amplified regardless of the values of the parameters Δ and ρ. (TIF) Click here for additional data file.

Signal-to-noise ratio of the readout in presence of external input noise.

Signal-to-noise ratio of the readout as a function of the standard deviation of the input noise σ for two values of the non-normal parameter Δ. Non-amplified dynamics (Δ = 1) are less robust to noise than amplified dynamics (Δ = 4). Dashed lines correspond to the theoretical values (Eq 85). In simulations, N = 1000. Errorbars represent the standard deviation of the mean over 200 realizations of the connectivity matrix. (TIF) Click here for additional data file. 14 Nov 2019 Dear Dr Bondanelli, Thank you very much for submitting your manuscript, 'Coding with transient trajectories in recurrent neural networks', to PLOS Computational Biology. As with all papers submitted to the journal, yours was fully evaluated by the PLOS Computational Biology editorial team, and in this case, by independent peer reviewers. The reviewers appreciated the attention to an important topic but identified some aspects of the manuscript that should be improved. We would therefore like to ask you to modify the manuscript according to the review recommendations before we can consider your manuscript for acceptance. Your revisions should address the specific points made by each reviewer and we encourage you to respond to particular issues Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.raised. In addition, when you are ready to resubmit, please be prepared to provide the following: (1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors. (2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text. (3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution. Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are: - Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition). - Supporting Information uploaded as separate files, titled 'Dataset', 'Figure', 'Table', 'Text', 'Protocol', 'Audio', or 'Video'. - Funding information in the 'Financial Disclosure' box in the online system. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org. If you have any questions or concerns while you make these revisions, please let us know. Sincerely, Kenneth D. Miller Guest Editor PLOS Computational Biology Lyle Graham Deputy Editor PLOS Computational Biology A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] The reviewers agree that this paper is a good and solid contribution but also raise many smaller issues that need to be addressed. Their reviews are clear, I would add just two very tiny comments to reviewer comments: reviewer 2 comment 1: if there are degenerate eigenvalues, then either the corresponding subspace must be missing an eigenvector, or that subspace of eigenvectors is normal (an orthonormal basis of eigenvectors can be chosen for that subspace). reviewer 3 comment 3: more generally for a normal matrix the singular values are the absolute values (or modulus, for complex eigenvalues) of the eigenvalue- singular values are nonnegative and real, eigenvalues can be negative or complex. Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: This paper provides a number of useful analytical results characterizing the strength and structure of non-normal transient amplification, which is of relevance to the study of dynamics in many biological networks, including recurrent neural networks. In the theoretical neuroscience literature, transient amplification has been previously proposed as a mechanism underlying fast spontaneous cortical fluctuations and selective amplification of neural patterns, as well as in networks with extensive working memory. While non-normality (non-orthogonality of eigenvectors) is necessary for transient amplification in linear networks, it is clear that any slight deviation from normality is not sufficient for creating amplification. The authors therefore present sufficient conditions for the latter (criteria in terms of eigenvalues of the symmetrized connectivity matrix, which have been known previously, at least in the math literature on transient amplification), which they also specialize to classes of network structures that have been of interest in theoretical neuroscience. They then provide a simple method for determining the spatial (i.e. in terms of neural population pattern) and temporal structure of transient amplification using a singular value decomposition of the linear network’s propagator. They use these tools to study transient amplification in effectively low dimensional (i.e. low-rank) as well as high dimensional random systems, and to study the robustness of selective amplification in the presence of random structural and dynamic perturbations. Finally they provide a proposal for constructing systems that can selectively (and transiently) amplify multiple specific patterns, and they quantify the corresponding capacity of such linear networks —i.e. how many on-average orthogonal patterns can be “learned” (i.e. included in the connectivity matrix), without disrupting the desired selective amplification. This is a nice paper with several technical results that could be quite useful to the theoretical/computational neuroscience community. Even though many of these results are easy to derive —and a few have been already published, as noted by authors, in the math literature on transient amplification— I still believe this is a useful paper for the community, in that it presents all such old and new results together, specializes them to some network structures of interest in neuroscience, and also makes proposals about their possible computational utility. The paper is also well-written, and discussions and expositions were clear and to-the-point. So I recommend its publication with minor corrections that I list below. Comments: 1. Lines 93-94 (and also lines 162): The language needs to be refined/made more precise. It’s, strictly speaking, meaningless to talk about identifying “different inputs giving rise to amplified trajectories” or estimating “their numbers”, or asking “… how many inputs are amplified?", since the input space (and its subspace with those properties) is a continuum, or more precisely a vector space with infinite members. You should rather say “identity the input subspace giving rise to amplified trajectories” and “estimating the dimensionality of this subspace”, and/or “how many orthogonal inputs [or input patterns] are amplified?". 2. Lines 126-127: you say “… except in the case of 126 one-dimensional dynamics or symmetric connectivity matrices.” The exception is not limited to symmetric matrices and should cover all normal J’s, so this phrase should be corrected. (E.g. consider anti-symmetric J’s which have imaginary eigenvalues: in that case the symmetric part is 0 and all its eigenvalues are 0, so the second condition is 0 >1, while because eigenvalues of J are imaginary the first condition is 0 < 1, and the two are inconsistent.) 3. Line 210: you end the sentence by “… when the connectivity J is Gaussian.", but the correct statement is “… when the connectivity matrix J is a random matrix with independent and identically distributed elements.” Gaussianity is not key (due to universality theorems on the circle and semicircular laws), but independence is key: so conversely a (symmetric) gaussian matrix with nontrivial covariances across elements need not give rise to the circle (semicircular) law. 4. Caption of Fig. 4: Change the caption for panel C to "C. Illustration of the dynamics elicited by the three inputs, R_1, R_2 and R_100 (show in different rows), as in B.” 5. Lines 228 and 231: you need to define what you mean by minimal connectivity. It seems that you mean a matrix with minimal rank. So try to justify why that notion of minimality is relevant to neuroscience. 6. Lines 280-292: the writing has to be made accurate regarding the “orthogonality" of different pattens. If the u_p/v_p vectors pairs for different p’s are literally mutually orthogonal (as the current text suggests) then there will be exactly zero interference between different rank-one terms (i.e. different patterns), and fluctuations of the activity along a readout u_p caused by other patterns will be exactly 0, and not just O(\\sqrt{P}/N). In this case the “capacity” would be N/2. But reading the Methods section it becomes clear that the authors are implicitly talking about a connectivity with the rank-P form with the different u’s and v’s sampled from a random ensemble of vectors with independently distributed components, such that different vectors are only approximately orthogonal with high probability (when N is large), and not literally orthogonal as the main text in lines 280-290 suggests. So this should be corrected. 7. Lines 27-28: replace “… but sufficient conditions for such amplification were not given.” with “… but general sufficient conditions for such amplification were not given.” 8. Line 53: I would add the following at the end of this sentence (first sentence of “Monotonic vs Amplified Transient Trajectories”): “… in the autonomous network with I(t) = 0." 9. Lines 740, 749 and 763: fix the broken (appearing as ?) references. 10. The brackets (signifying averaging) around dr_i on the left hand side should be removed. 11. In equation 80 (lines 782-783): the sum over the index l is only needed for the first term and the second term (the one involving \\eta_k) should not be summed over l. 12. Line 817: “randomly distributed” should be replaced with “randomly and independently distributed”. 13. Line 343: “straightforward” doesn’t have a hyphen. 14. Figure 6: In panel E, the maximum readout value for P/N ~ 0.02 seems to be significantly higher than the maximum of the red curve in panel D. Why? 15. Figure 2: in panel B, it would be helpful to also include a line for \\tau>1, corresponding to max Re \\lambda(J) >0. 16. Figure 2: panel C would be more inofrmative if it is turned into a heat-map of \\sigma_1^*, with the two phases (monotonic and amplified) and their boundary indicated/drawn on top of the heat-map. 17. In caption of Fig. 4, panel B: replace “… at time t_* in pannel [sic.] A” with “… at time t_* indicated by the dashed vertical line in panel A”. 18. Fig 4 C, last row, middle column: unlike in the top and middle row of panel C, the labels L_100^* and R_100^* are missing here. 19. Line 544: replace "symmetrically arranged along the imaginary dimension.” with "symmetrically arranged on either side of the real axis.” , Reviewer #2: The authors provide a clear and elegant explanation of the behavior of linear networks of neurons in terms of transient amplification. The paper establishes the basic tools needed to understand transient amplification (eigenvalues of the symmetric part of the connectivity matrix, SVD of the propagator…) and analyses in details random and low-rank connectivity matrix, together allowing the reader to develop an intuitive understanding of the phenomenon. I also liked very much the structuring of the paper into 3 parts: simple take-home messages, methods, and appendices. I congratulate the authors. The study will be very useful for teaching. I only have a couple of small comments: 1) I had the intuition that strong amplification required not only non-normality (lack of orthogonality of the eigenvectors of J), but also a **difference** between their corresponding eigenvalues. In this case, initial conditions loading on several non-orthogonal eigenvectors (i.e., those involving ‘cancellations’) align initially with the slowest decaying modes (this is the period of growth), and eventually decay. Is this intuition incorrect in general, or is this picture somehow hidden in the properties of J_s? 2) Transient amplification (TA) is usually studied in connection to specific initial conditions from which amplification happens. The ‘transient’ is then purely generated by the recurrent connectivity. From the point of view of neuroscience, I think it would be interesting to include some comments in the discussion about the kinds of experimental or real-life situations where this may be expected to occur or not. For instance, TA has been related to preparatory states before movement. How about onset sensory responses in cortex? How about time-varying inputs (which actually are the ethological norm)? The discussion on this would benefit from the recognition that, although linear models provide interesting approximations, the real networks have significant non-linearities, even at the firing rate level… (I say this because if everything is purely linear, there is perfect superposition in time and time-varying input is not particularly interesting...) Reviewer #3: # Summary and main comments This paper discusses non-normal mechanisms for transient amplification in linear recurrent networks. The authors study the distinction between two classes of networks: those that transiently amplify their inputs, and those that do not. They provide a necessary and sufficient condition on the connectivity matrix for a network to belong to the latter, and go on explaining i) how to find those inputs that are transiently amplified and ii) how to construct networks that can exploit such transient amplification effects to perform "coding". I am very strongly supportive of publication, as this is both a timely and very well executed paper, easy to read, well illustrated. I only have a couple of suggestions: 1. Perhaps my main comment concerns the old debate on matrix nonnormality, and how it is inherently difficult to characterise a concept that has to do with the geometry of a matrix' eigenbasis using a single scalar number (here, the numerical abscissa). In this paper, the distinction is made between networks in which the activity vector can grow transiently in response to an appropriately chosen input; and those that simply can't, regardless of the input. The authors derive a diagnostic criterion based on the numerical abscissa; while this is entirely valid (given the definition of "amplifying vs non-amplifying" networks outlined above), it is nevertheless an asymptotic criterion which only concerns the slope of the norm $\\| r(t)\\|$ of the response at small times ($t=0^+$) and therefore _in principle_ provides little insight about how much $\\| r(t) \\|$ will grow and for how long. To be fair, the authors go on discussing an analysis of the induced 2-norm of the propagator $e^{tA}$, which gives the full description of transients, but ─ as they acknowledge ─ whose SVD can only be computed numerically in most cases. There is a whole chapter in Trefethen & Embree ("Spectra and pseudospectra"; chap. 4 on transients) which discusses alternative summary scalar measures that are much better indications of expected transient magnitude / durations (i.e. provide useful bounds on transient size & timing). The pseudospectrum and associated Kreiss theorems should probably be mentioned. So I suggest the authors discuss these issues briefly in the intro / discussion. Perhaps a useful place to start is the 2x2 example they have already studied in depth, for which it's not too difficult to find corner cases where the max amplification is bounded whereas the numerical abscissa isn't. E.g. the peak amplification for W = [ 1; -a]; [1; -a] ] is upper bounded by $\\sqrt{2}$ (it tends to $\\sqrt{2}$ as $a \\to \\infty$); yet the numerical abscissa (spectral abscissa of $W+W^T-2I$) grows linearly in $a$ ($\\approx (\\sqrt(2)-1) a$ (the peak of amplification occurs earlier and earlier, too). 2. Going back to the "coding" viewpoint, adopted in the introduction: for the type of deterministic Markovian dynamics considered here, future states are entirely determined by the current one ─ the whole trajectory is entirely determined by the initial condition; so for deterministic dynamics, there can be no advantage of eliciting fancy transients: there is just as much information about the stimulus at the peak of the transient as there was at $t=0$. The advantage of amplification for coding become apparent when noise is taken into consideration ─ but then, it is important to distinguish between process noise (which tends to get amplified together with signal) and observation noise (which doesn't). Nonnormal transients don't help much with process noise (though that depends on its geometry), but can hugely increase the mutual information between stimulus and r(t) in cases when output (observation) noise is strong; having signal rise above noise is a big win. I would recommend discussing this briefly, too. # Minor comments: 1. line 30: > This results leads to a simple distinction between two classes of networks: networks in which all inputs lead to weak, decaying transients, and networks in which specific inputs elicit strongly amplified transient responses. Cf main comment (1) above: this is a little too strong, as the class of networks of the latter type also contains networks that amplify very weakly. 2. line 71: > necessarily implies that the firing rate of at least one neuron shows a transient increase before decaying to baseline. increase, or decrease (but can always be turned into an increase by reversing the sign of the initial condition of course) 3. line 177: > Note that for a normal matrix, the left and right singular vectors $R(t)_k$ and $L(t)_k$ are identical, and the singular values are equal to the eigenvalues [...] corner case: e.g. skew symmetric (hence normal) matrix W has complex eigenvalues, whereas singular values are real 4. regarding the margin: > To eliminate the trajectories with very short amplification, one can further constrain the slopes to be larger than a margin $\\epsilon$ again, this asymptotic result concerns the slope at time t=0; even with a margin, this seems to give no formal guarantee on the amount of amplification that can follow 5. line 228: > Specifically, we wish to determine the minimal connectivity that transiently transforms a fixed, arbitrary input $r_0$ into a fixed, arbitrary output $w$, through two-dimensional dynamics. Considerations of time unclear here → transform $r_0$ into $w$ _at some point_ during the transient? 6. reference missing on line 405 7. reference to figure missing on line 808. Best wishes, Guillaume Hennequin ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: Yes: Guillaume Hennequin 10 Dec 2019 Submitted filename: response.pdf Click here for additional data file. 3 Jan 2020 Dear Dr Bondanelli, Thank you very much for submitting your manuscript, 'Coding with transient trajectories in recurrent neural networks', to PLOS Computational Biology. As with all papers submitted to the journal, yours was fully evaluated by the PLOS Computational Biology editorial team, and in this case, by independent peer reviewers. The reviewers appreciated the attention to an important topic but identified some aspects of the manuscript that should be improved. We would therefore like to ask you to modify the manuscript according to the review recommendations before we can consider your manuscript for acceptance. Your revisions should address the specific points made by each reviewer and we encourage you to respond to particular issues Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.raised. In addition, when you are ready to resubmit, please be prepared to provide the following: (1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors. (2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text. (3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution. Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are: - Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition). - Supporting Information uploaded as separate files, titled 'Dataset', 'Figure', 'Table', 'Text', 'Protocol', 'Audio', or 'Video'. - Funding information in the 'Financial Disclosure' box in the online system. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org. If you have any questions or concerns while you make these revisions, please let us know. Sincerely, Kenneth D. Miller Guest Editor PLOS Computational Biology Lyle Graham Deputy Editor PLOS Computational Biology A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] (From K Miller, guest editor) The reviews supported publication of the paper and the authors have responded well to all the review comments. The paper is almost ready for publication. However in scanning through the paper looking at the corrections I noticed a number of very small points that should be revised in a final version: Abstract: "and networks in which specific inputs elicit strongly amplified transient responses and are mapped onto orthogonal output states during the dynamics". Here you are talking about the whole class of networks that show initial amplification, which may not be strongly amplified and need not be mapped onto orthogonal states, so this should be changed, e.g. to "and networks in which specific inputs elicit amplified responses during the dynamics". You could if you want restore the orthogonal concept by inserting 'orthogonal' in "We then build minimal, low-rank networks ... mapping a specific input onto a specific *orthogonal* output state." Line 62: monotonic decay "exploring essentially a single dimension" -- this isn't true, the monotonic decay could curve or spiral through an arbitrarily high-dimensional space. So I suggest cutting this phrase. 'or transiently move away from it by following a rotation': similarly, this need not be a rotation -- for example if in 2D you have two eigenvectors both nearly vertical, and you start with a small horizontal initial condition, and eigenvalues are real and of very different magnitudes, then the trajectory will go up toward the eigenvector with the larger eigenvalue and then decay down along that eigenvector, which is nearly an up and down motion with only a slight rotational component. So I'd suggest cutting "by following a rotation". Line 72: "the firing rate of at least one neuron shows a transient increase (or decrease) before decaying to baseline" -- 'increase or decrease' includes pretty much everything, so this is more or less a tautology. I think what you want to state is that at least one r_i shows a transient increase in its absolute value. You've also gotten yourself into a bit of trouble since you are calling the r_i's the deviation of the firing rate from the fixed point, so you cannot simply refer to them as 'the firing rates', i.e. you can't say 'at least one firing rate increases in its absolute value'. You could get yourself out of this by, under Eq. 1, saying that for simplicity you will refer to the r_i's as the firing rates, in which case you would want to say 'before decaying to zero' instead of 'before decaying to the baseline'. Lines 144-147: "two interacting excitatory-inhibitory populations" is confusing, it sounds like 4 populations, two excitatory and two inhibitory. You could just say 'an excitatory and an inhibitory population'. "our criterion states that the excitatory feedback needs to be (approximately) larger than unity in order to achieve transient amplification (Fig. 2 and Methods)". 'Methods' should instead be Appendix F. But also, this statement isn't correct -- it's only correct when the network is of the form {{w,-k w},{w,-k w}} with k=1+\\epsilon for \\epsilon small. It is certainly not true for the general case of one excitatory and one inhibitory population. The same problem is in the legend of Figure 2, which again states this same criterion, now the form of the matrix is specified but you haven't noted the restriction on k. Lines 315 and 323: I think you want to cite citation [37], not [38] Lines 378-380: "the inhibition-dominated regime, which as we show approximately corresponds to the class of unit-rank E-I networks satisfying the general criterion for transient amplification." This isn't true -- if you assume the form {{w,-k w},{w,-k w}}, you can get transient amplification in a stable network either with k<1 or k>1. (Mathematica tells me: If k<1, the criterion is (2 (-1 + k))/(1 + k)^2 + 2 Sqrt[2] Sqrt[(1 + k^2)/(1 + k)^4] < w < -(1/(-1 + k)); while if k>1, the criterion is w > (2 (-1 + k))/(1 + k)^2 + 2 Sqrt[2] Sqrt[(1 + k^2)/(1 + k)^4].) Methods, Eq. 30: You should note that, *for J stable*, Eq 30 is the criterion for Eq. 28 to be >1, and that the stability condition that Det(J-1)>0 is precisely the condition that the argument of the square root in (30) is positive, so that \\Delta_c is real. Without the condition that J is stable, the criterion for Eq 28 > 1 is more complicated. Appendix F, line 1012 "recovers the results from (32), showing that the system is amplified if the excitatory strength w is (approximately) larger than one." In (32), we showed something different: that for an initial condition of r_E>0, r_I=0, r_E (not |r|) shows transient amplification if and only if w>1, for any value of k. What you are showing is that there will be some initial condition for which |r| will show transient amplification if k=1+\\epsilon and w>1-\\epsilon^2/4 for \\epsilon<<1. These are very different. I'm sorry it took me a while to get to this after the revision came in, due to the holidays. If it comes back with these revisions I promise very quick turnaround. 13 Jan 2020 Submitted filename: response_2.pdf Click here for additional data file. 14 Jan 2020 Dear Dr Bondanelli, We are pleased to inform you that your manuscript 'Coding with transient trajectories in recurrent neural networks' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Once you have received these formatting requests, please note that your manuscript will not be scheduled for publication until you have made the required changes. In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pcompbiol/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. One of the goals of PLOS is to make science accessible to educators and the public. PLOS staff issue occasional press releases and make early versions of PLOS Computational Biology articles available to science writers and journalists. PLOS staff also collaborate with Communication and Public Information Offices and would be happy to work with the relevant people at your institution or funding agency. If your institution or funding agency is interested in promoting your findings, please ask them to coordinate their releases with PLOS (contact ploscompbiol@plos.org). Thank you again for supporting Open Access publishing. We look forward to publishing your paper in PLOS Computational Biology. Sincerely, Kenneth D. Miller Guest Editor PLOS Computational Biology Lyle Graham Deputy Editor PLOS Computational Biology 3 Feb 2020 PCOMPBIOL-D-19-01144R2 Coding with transient trajectories in recurrent neural networks Dear Dr Bondanelli, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Laura Mallard PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

44 in total

1. Memory traces in dynamical systems.

Authors: Surya Ganguli; Dongsung Huh; Haim Sompolinsky
Journal: Proc Natl Acad Sci U S A Date: 2008-11-19 Impact factor: 11.205

2. Balanced amplification: a new mechanism of selective amplification of neural activity patterns.

Authors: Brendan K Murphy; Kenneth D Miller
Journal: Neuron Date: 2009-02-26 Impact factor: 17.173

3. Heterogenous population coding of a short-term memory and decision task.

Authors: Joseph K Jun; Paul Miller; Adrián Hernández; Antonio Zainos; Luis Lemus; Carlos D Brody; Ranulfo Romo
Journal: J Neurosci Date: 2010-01-20 Impact factor: 6.167

4. Dynamic Control of Response Criterion in Premotor Cortex during Perceptual Detection under Temporal Uncertainty.

Authors: Federico Carnevale; Victor de Lafuente; Ranulfo Romo; Omri Barak; Néstor Parga
Journal: Neuron Date: 2015-05-07 Impact factor: 17.173

Review 5. Neural circuits as computational dynamical systems.

Authors: David Sussillo
Journal: Curr Opin Neurobiol Date: 2014-02-05 Impact factor: 6.627

6. Encoding sensory and motor patterns as time-invariant trajectories in recurrent neural networks.

Authors: Vishwa Goudar; Dean V Buonomano
Journal: Elife Date: 2018-03-14 Impact factor: 8.140

7. Inhibitory stabilization of the cortical network underlies visual surround suppression.

Authors: Hirofumi Ozeki; Ian M Finn; Evan S Schaffer; Kenneth D Miller; David Ferster
Journal: Neuron Date: 2009-05-28 Impact factor: 17.173

8. Neural population dynamics during reaching.

Authors: Mark M Churchland; John P Cunningham; Matthew T Kaufman; Justin D Foster; Paul Nuyujukian; Stephen I Ryu; Krishna V Shenoy
Journal: Nature Date: 2012-07-05 Impact factor: 49.962

9. Context-dependent computation by recurrent dynamics in prefrontal cortex.

Authors: Valerio Mante; David Sussillo; Krishna V Shenoy; William T Newsome
Journal: Nature Date: 2013-11-07 Impact factor: 49.962

10. Go/No-Go task engagement enhances population representation of target stimuli in primary auditory cortex.

Authors: Sophie Bagur; Martin Averseng; Diego Elgueda; Stephen David; Jonathan Fritz; Pingbo Yin; Shihab Shamma; Yves Boubenec; Srdjan Ostojic
Journal: Nat Commun Date: 2018-06-28 Impact factor: 14.919

11 in total

1. Nonlinear transient amplification in recurrent neural networks with short-term plasticity.

Authors: Yue Kris Wu; Friedemann Zenke
Journal: Elife Date: 2021-12-13 Impact factor: 8.140

2. Oscillations and variability in neuronal systems: interplay of autonomous transient dynamics and fast deterministic fluctuations.

Authors: Rodrigo F O Pena; Horacio G Rotstein
Journal: J Comput Neurosci Date: 2022-06-02 Impact factor: 1.453

3. An evolving perspective on the dynamic brain: Notes from the Brain Conference on Dynamics of the brain: Temporal aspects of computation.

Authors: Angela J Langdon; Rishidev Chaudhuri
Journal: Eur J Neurosci Date: 2020-10-03 Impact factor: 3.386