We discuss simple models for the transient storage in short-term memory of cortical patterns of activity, all based on the notion that their recall exploits the natural tendency of the cortex to hop from state to state-latching dynamics. We show that in one such model, and in simple spatial memory tasks we have given to human subjects, short-term memory can be limited to similar low capacity by interference effects, in tasks terminated by errors, and can exhibit similar sublinear scaling, when errors are overlooked. The same mechanism can drive serial recall if combined with weak order-encoding plasticity. Finally, even when storing randomly correlated patterns of activity the network demonstrates correlation-driven latching waves, which are reflected at the outer extremes of pattern space.
We discuss simple models for the transient storage in short-term memory of cortical patterns of activity, all based on the notion that their recall exploits the natural tendency of the cortex to hop from state to state-latching dynamics. We show that in one such model, and in simple spatial memory tasks we have given to human subjects, short-term memory can be limited to similar low capacity by interference effects, in tasks terminated by errors, and can exhibit similar sublinear scaling, when errors are overlooked. The same mechanism can drive serial recall if combined with weak order-encoding plasticity. Finally, even when storing randomly correlated patterns of activity the network demonstrates correlation-driven latching waves, which are reflected at the outer extremes of pattern space.
Despite much effort directed towards understanding the neural processes underlying short-term memory (STM), what causes its notoriously limited capacity has, to this day, remained largely mysterious [1-5]. If one were to take a functionalist perspective, inspired e.g. by Baddeley’s theory of working memory [6], and assume that items in short-term memory are transiently represented in a dedicated cortical module, where they have been copied from their long-term traces, two riddles would arise: how would the copying work? and why would this module have such poor capacity? Multiple lines of evidence, particularly since the advent of functional imaging, have however failed to identify an ad hoc STM module, and indicated that STM is expressed by the activity of the same neurons that participate in the representation of long-term memories (LTM) [7]. This disposes of the copy riddle, but emphasizes the capacity one. What makes us able, for example, to recognize tens of thousands of images as familiar [8] and yet unable to detect a change in a configuration of more than a few elements that we have just seen [9]? Focusing on the recall of sequences of well-known items, what makes it so difficult to go, again, beyond very short sequences?Addressing this riddle with a mathematically well-defined neural network model requires, in our view, a model that, however drastically simplified, captures the widely distributed nature of the cortical representations which STM as well as LTM can rely on. We argue that a Potts network is adequate in this respect [10]. A Potts network can model the long-range interactions among patches of cortex and, without any ad hoc component, shows a tendency to hop spontaneously from activity pattern to activity pattern, recalling them in a sequence resembling a random walk. We call this latching dynamics and propose here that it holds the key to understand STM limitations, once combined with some mechanism, perforce imprecise, for short-term storage. We consider a number of distinct mechanisms of this type, that by adding an extra “kick” to boost a small subset of L among p patterns in long-term memory, approximately restrict latching dynamics to the subset, which is then effectively kept in short-term memory.We show that this formulation fits with the general hypothesis that interference between memories is critical [11] as well as with the gist of the recently proposed statistical theory of free recall, as implemented by stochastic trajectories among ensembles of items [12], in fact unifying them: depending on the task, the limiting factor turns out to be either interference from items in long-term memory or the randomness in retrieval trajectories.While the basic model needs more structure to be predictive about specific behaviour, e.g. in semantic priming experiments [13], or about the effects of item complexity [14] or individual differences [15], and in general to fully benchmark its validity as a model of short-term memory [4], we show that it is consistent with simple experiments, that illustrate the way STM limitations depend on task demands. In free recall, where repetitions and mistakes are not penalised, the number M of retrieved items tends to scale sublinearly with L, reflecting largely random exploration. In a task which is terminated by mistakes, instead, capacity is constrained by the interference of other items in long-term memory. Further, modeling serial recall with hetero-associative short-term synaptic enhancement leads to the conclusion that latching dynamics is preserved only if the enhancement is weak, and then it generates limited sequences, similar to those shown by human subjects when asked to serially recall unstructured items, without recourse to LTM aids.The paper is organized as follows. In Models we first review the basics of the Potts network for LTM, and its tendency to latch; those familiar with it may go directly to the next subsection, which compares three ways to harness it for short-term memory, showing how they enable to analyze STM limits. In Results we look at the performance of the second mechanism in free recall and compare it with experimental data, before modelling serial recall with the same model. The nature of the trajectories the model follows in memory space is further analysed in the last part of Results, with concluding remarks in Discussion.
Models
The Potts network for the storage of long-term memories
A Potts neural network is an autoassociative memory network comprised of N Potts units, each of which represents the state of a single patch of the cortex as it contributes to retrieve distributed LTM traces addressed by their contents [16]. Each Potts unit has S active states, indexed as 1, 2, ⋯, S, representing local attractors in that patch, and one background-firing state (no local attractor is activated), the 0 state. The N units interact with each other via tensor connections, that represent associative long-range interactions through axons that travel through the white matter [17], while local, within-gray-matter interactions are assumed to be governed by attractor dynamics in each patch (Fig 1A). The values of the tensor components are pre-determined by the Hebbian learning rule, which can be contrued as derived from Hebbian plasticity at the synaptic level [10]
where c is either 1 if unit j gives input to unit i or 0 otherwise, allowing for asymmetric connections between units, and the δ’s are the Kronecker symbols. The number of input connections per unit is c. The p distributed activity patterns which represent LTM items are assigned, in the simplest model, as composition of local attractor states (i = 1, 2, ⋯, N and μ = 1, 2, ⋯, p). The variable indicates the state of unit i in pattern μ and is randomly sampled, independently on the unit index i and the pattern index μ, from {0, 1, 2, ⋯, S} with probability
Constructed in this way, patterns are randomly correlated with each other. We use these randomly correlated memory patterns in this study, but envisage later generalizing it to a set of correlated memory patterns, as produced by the algorithm presented in [18]. The parameter a is the sparsity of patterns—fraction of active units in each pattern; the average number of active units in any pattern μ is therefore given by Na.
Fig 1
Latching dynamics of the Potts neural network.
(a): The Potts network encapsulates local attractor dynamics within cortical patches into Potts spins and describes attractor dynamics in the global network of the cortex by means of a network of Potts units. (b): Phase diagram of a Potts neural network in w − γ plane. The x-axis is γ, the proportion of fast inhibition in the dynamics. γ = 0 (1) means only slow (fast) inhibition. The y-axis is the self-reinforcement parameter w. In false color, the proportion of simulations that exhibit finite latching. Increasing w, in fact, one observes different latching phases: no latching (noL), finite latching (L), infinite latching (infL) and stable attractor phase (SA). White circles indicate four points, where examples of latching sequences are shown in the bottom panels, all produced with time constants τ1 = 0.01s, τ2 = 0.2s and τ3 = 100s. The x-axis corresponds to time, and the y-axis to the overlap, each colour with an item in long-term memory. (c): For too low w, in the no latching phase, there is only retrieval and the network cannot latch onto another pattern. (d): Increasing w, one reaches the finite latching phase, where the network retrieves a finite sequence of patterns, with high overlap. (e): Increasing w further, one reaches the infinite latching phase, where sequences are indefinitely long but the quality of latching is degraded. The mean dwell time in a pattern is also increased compared with the finite latching regime. (f): Increasing w even further, one gets to the stable attractor phase, where the network retrieves the cued pattern and cannot escape from that attractor.
Latching dynamics of the Potts neural network.
(a): The Potts network encapsulates local attractor dynamics within cortical patches into Potts spins and describes attractor dynamics in the global network of the cortex by means of a network of Potts units. (b): Phase diagram of a Potts neural network in w − γ plane. The x-axis is γ, the proportion of fast inhibition in the dynamics. γ = 0 (1) means only slow (fast) inhibition. The y-axis is the self-reinforcement parameter w. In false color, the proportion of simulations that exhibit finite latching. Increasing w, in fact, one observes different latching phases: no latching (noL), finite latching (L), infinite latching (infL) and stable attractor phase (SA). White circles indicate four points, where examples of latching sequences are shown in the bottom panels, all produced with time constants τ1 = 0.01s, τ2 = 0.2s and τ3 = 100s. The x-axis corresponds to time, and the y-axis to the overlap, each colour with an item in long-term memory. (c): For too low w, in the no latching phase, there is only retrieval and the network cannot latch onto another pattern. (d): Increasing w, one reaches the finite latching phase, where the network retrieves a finite sequence of patterns, with high overlap. (e): Increasing w further, one reaches the infinite latching phase, where sequences are indefinitely long but the quality of latching is degraded. The mean dwell time in a pattern is also increased compared with the finite latching regime. (f): Increasing w even further, one gets to the stable attractor phase, where the network retrieves the cued pattern and cannot escape from that attractor.Local network dynamics within a patch are taken to be driven by the input that the unit i in state k receives
where the local feedback w, introduced in [19], models the depth of attractors in a patch, as shown in [10]—it helps the corresponding Potts unit converge to its most active state. The activation along each state for a given Potts unit is updated with a soft max rule
where U is a fixed threshold common for all units and β measures the level of noise in the system. Note that takes continuous values in (0, 1) and that for any i. The variables , and parametrize, respectively, the state-specific potential, fast inhibition and slow inhibition in patch i. The state-specific potential integrates the input by
where the variable is a specific threshold for unit i and for state k. If it were constant in time, the Potts network would simply operate as an autoassociative memory with extensive storage capacity [20].Taking the threshold to vary in time to model adaptation, i.e. synaptic or neural fatigue selectively affecting the neurons active in state k, and not all neurons subsumed by Potts unit i
the Potts network additionally expresses latching dynamics, the key to its possible role in short-term memory.The unit-specific thresholds and describe local inhibition, which in the cortex is relayed by at least 3 main classes of inhibitory interneurons [21] acting on GABAA and GABAB receptors, with widely different time courses, from very short to very long. In the Potts network it has proved convenient, in order to separate time scales, to consider either very slow or very fast inhibition [19, 22]. Here, we consider a more realistic case in which both slow and fast inhibition are taken into account. Formally, we have two inhibitory thresholds and (to denote fast, GABA and slow, GABA inhibition, respectively) that vary in the following way:
where one sets τ < τ1 ≪ τ2 ≪ τ and the parameter γ sets the balance of fast and slow inhibition. If γ = 0, we have only slow inhibition in the network. If γ = 1, we have only fast inhibition. We have both for 0 < γ < 1. In this way, we make a small step towards a realistic network, while maintaining relative mathematical simplicity and the ability to apply a separation of time scales to better understand the phenomenology.We define an order parameter called overlap, which measures the distance between the network state and each pattern.
The overlap m is normalised in such a way that it takes the value of 1 when the network state is fully aligned with one pattern.With adaptation, the Potts network has four different phases of operation in the w − γ phase space (Fig 1B). The first one is the trivial no latching phase, where the network operates just as an autoassociative (long-term) memory, with large storage capacity, but dynamics stop after the retrieval of the cued pattern. The Potts network undergoes a phase transition by changing one of the network parameters (e.g., the local feedback w in Fig 1B). Above a phase transition, the network spontaneously latches, i.e., it generates a sequence of items, clearly defined but limited in length in the finite latching phase, and indefinite but progressively less well defined in the third phase, the infinite latching one, in which latching dynamics go on indefinitely after the initial associative retrieval. In the fourth phase the retrieved pattern is not destabilised by adaptation, and remains as a steady state. We call this the stable attractor phase.As the network hops from memory to memory, it can simulate free recall. This happens if latches are concentrated onto STM items, but otherwise free, i.e., not coerced by external agents. Key to such latching dynamics is that the specific thresholds ’s inactivate, when rising, only the corresponding attractor state and not the cortical patch tout court, allowing for a large variety of ensuing trajectories.
Incorporating short-term memory function
The Potts network has so far been studied as a model of long-term memory, but it can be tweaked in minimal ways to serve also as a model of short-term or working memory. While it remains a simple object to study, it demonstrates how memory operating on widely different time scales can utilise the very same neural representations and the same associative mechanisms, based on plausible and unsupervised synaptic plasticity rules.The core idea is that a few memory items, or sequences of items, are strengthened by increasing the value of some pre-existing parameter (Fig 2A). The increase, which cannot be presumed to be precisely determined, should be in any case moderate, to effectively bring only those items across a network phase transition, into a phase in which they or their sequences are held in short-term memory, effectively separate from the ocean of all items and all possible sequences in long-term memory (Fig 2B). So it is just an extra boost, without adding new components. The increase or extra boost is assumed to be temporary, and once it subsides, the short-memory has vanished. A critical assumption is that, since whatever plasticity in the brain serves as the extra boost, it has a transient time course, we should model it by modifying parameters in simple and coarse ways, in contrast with what we assume to happen when encoding long-term memories, which in principle can be refined over many repetitions/recall instances, and can be taken therefore to reflect very precisely set parameters, down to the level of individual synaptic efficacies.
Fig 2
Different models for holding items in STM yield qualitatively different recall performance.
(a): Schematic of the way STM is implemented in the three models. Model 1 acts at the unit-level, Model 2 at the Potts-state level, and Model 3 at the synapse level. (b): Schematic diagram of models for STM. The STM function is produced by a“boost” Δx in the parameter x, representing w, θ and J for Model 1, 2, and 3, respectively. (c): The quantity ΔM has a maximum at around L ≃ 32 for Model 2 and 3b and it continues to grow for Model 3a, while it remains always close to zero for Model 1. The abscissa is L, the number of items in STM, in log scale. The ordinate is ΔM ≡ Mcorr(Δx = 0.3) − Mcorr(Δx = 0.0), where Mcorr is the number of recalled STM items until the network either repeats an already-visited item or (mistakenly) retrieves one of the LTM items. (d): The different propensity to latch, i.e., to make transitions, is quantified by the number of latches per sequence, plotted as a function of L for the 3 models, in a log-log scale. The strength of the boost is, again, Δx = 0.3 for each model. The horizontal dashed line indicates the number of latches per sequence when all p patterns are on equal footing, i.e., there is no boost. (e): The proportion of resources utilised in the models predicts the peak of the performance ΔM. The dashed horizontal line indicates the proportion equal to . Across all 3 panels, parameters are p = 200, S = 7, a = 0.25, γ = 0.5 and w = 1.1.
Different models for holding items in STM yield qualitatively different recall performance.
(a): Schematic of the way STM is implemented in the three models. Model 1 acts at the unit-level, Model 2 at the Potts-state level, and Model 3 at the synapse level. (b): Schematic diagram of models for STM. The STM function is produced by a“boost” Δx in the parameter x, representing w, θ and J for Model 1, 2, and 3, respectively. (c): The quantity ΔM has a maximum at around L ≃ 32 for Model 2 and 3b and it continues to grow for Model 3a, while it remains always close to zero for Model 1. The abscissa is L, the number of items in STM, in log scale. The ordinate is ΔM ≡ Mcorr(Δx = 0.3) − Mcorr(Δx = 0.0), where Mcorr is the number of recalled STM items until the network either repeats an already-visited item or (mistakenly) retrieves one of the LTM items. (d): The different propensity to latch, i.e., to make transitions, is quantified by the number of latches per sequence, plotted as a function of L for the 3 models, in a log-log scale. The strength of the boost is, again, Δx = 0.3 for each model. The horizontal dashed line indicates the number of latches per sequence when all p patterns are on equal footing, i.e., there is no boost. (e): The proportion of resources utilised in the models predicts the peak of the performance ΔM. The dashed horizontal line indicates the proportion equal to . Across all 3 panels, parameters are p = 200, S = 7, a = 0.25, γ = 0.5 and w = 1.1.Different neural-level mechanisms can constrain latching dynamics to a small subset of activity patterns that represent items in long-term memory. It can be envisaged that several of them may operate in synergy. Here we analyse three, which can be simply associated with distinct parameters of the Potts network, and we consider each mechanism separately from the other two, to demonstrate its characteristics (Fig 2A). The parameters we focus on are the degree of local feedback (Model 1), the local adaptive thresholds (Model 2) and the strength of long range connections (Model 3). In each case, a single parameter is therefore varied across many network elements, so that L patterns, those supposed to be held in short-term memory, are driven into the latching regime (Fig 2B). This change, which embodies short-term storage, should avoid pushing into the latching regime also the other p − L patterns, but to some extent their involvement is unavoidable, as will be shown.
Model 1: Stronger local feedback for the items held in STM
The first mechanism models increased depth of the attractors in the patches of cortex where any of the L patterns is active, which could reflect a generic short-term potentiation of the synaptic connections among pyramidal cells in those patches, what in the Potts network is summarily represented by the parameter w ([10, 19]). In the model, each of the L items is active over aN Potts units, and their active states are shared with many other items not intended to go into STM. This is the coarseness that leads to limited capacity of memory: if L is too large, virtually all of the units are given the boost, all with the same strength, and no distinction between the L selected items and the other p − L remains. Formally, instead of common w for all Potts units, we introduce
where is the state of pattern ξ at the unit i, Θ(⋅) is the Heaviside step function and is the Kronecker’s delta symbol.If a unit participates in the representation of any one of the L patterns in STM, then w = w + Δw. If not, w = w.
Model 2: Lower adaptive threshold for the items held in STM
In the second mechanism, a parameter regulating firing rate adaptation is reduced selectively for the neurons that are active, in those patches, in the representation of the L items. That is, we decrease adaptation, by subtracting from the adapted threshold () a term Δθ, for the Potts states that are active in any one of the L patterns,
Model 3: Stronger long-range connections for the items held in STM
The third mechanism we consider is the one acting on the long-distance synaptic connections between neurons, represented in the Potts network [10] by the tensor connections between Potts units. We model short-term potentiation of the synaptic connections by stronger tensor connections. Since the latter connect separate Potts units, however, in order to specify exactly which tensor elements are considered to be potentiated, we have to specify whether the L patterns, in the task, are taken to be stored simultaneously. We consider two opposite cases. If they are assumed to be all stored at separate times, the stronger tensor elements are those that connect Potts states of two units both active in any one of the L patterns. If they are assumed to be all stored in STM together, the stronger elements are all those that connect Potts states of two units both active in any pair of the L patterns. We call them variants a and b of Model 3.
Model 3a: Model 3 with only autoassociative connections in short-term memory
where is the strength of connections that do not belong to any one of L patterns in STM, given in Eq (1). Here we say that a connection belongs to a pattern when the two states that are paired by the connection participate in the representation of the pattern.
Model 3b: Model 3 with all associative connections among STM items
where is again given in Eq (1). In this model, we potentiate extra connections in addition to those that are potentiated in Model 3a. These are the so-called heteroassociative connections that connect Potts states of one item to those of another item in STM.
Different models for holding items in STM are differentially effective
For the sake of a fair comparison among the mechanisms (Models 1, 2 and 3), we equalize the values of all parameters as they affect the L patterns, so that in practice, rather than bringing them into the latching regime, which is what should happen in the real process, in our model evaluation we push the other p − L out, or partially out, in different directions.We first consider how effective are the three mechanisms in constraining latching dynamics to the L items in STM. We find that for Models 2 and 3a, latching dynamics are effectively constrained to the L items, but only up to a given value of L (see Fig 2C, where we have shown the result for specific values of the parameters, e.g. Δx = 0.3, but those are representative of a broad range, as shown in S1–S4 Figs). The effectiveness is measured, in Fig 2C, by a quantity called Mcorr, which is the number of recalled STM items until the network either repeats one of already-visited items or retrieves one of the LTM items. We then consider the difference between this quantity and the value it would have without any differentiation between the L and the other items, ΔMcorr ≡ Mcorr(Δx = 0.3) − Mcorr(Δx = 0.0); this subtraction of the chance level quantifies the genuine effect of Δx. Here x represents w, θ and J for the 3 models, respectively. When we increase L, there are two main factors that affect Mcorr. The first one is the exploration by the trajectory, resembling that of a random walk, which increases Mcorr. Due to this effect Mcorr should grow like as a function of L (see S1 Appendix) if there are no errors, i.e. recall of items that are not in short-term memory. The occurrence of errors is the second factor that affects Mcorr, progressively more as L increases. When L is small, the first factor dominates and as a result, Mcorr grows. Beyond a certain value of L, there is an avalanche of errors as there are many LTM patterns that are kicked as strongly as those in STM. This avalanche of errors causes the sudden drop of ΔMcorr seen for Model 2 and 3a in Fig 2C. We can attempt to understand this limitation as being due to interference from the LTM items, that start to dominate the dynamics at different values of the list size L. To illustrate this, let us consider the proportion of elements (units, states and connections for Model 1, 2 and 3, respectively) that are enhanced for a given number L. If we randomly pick, respectively, one unit, state or connection, then the probability of it belonging to one of L patterns in STM can be written, respectively, for Models 1, 2, 3a and 3b:
All of these quantities approach 1 when L becomes very large, as all elements become used towards encoding the list in STM. As a rough estimation, we can set a somewhat arbitrary criterion of , above which more than half of all elements are used, and the network cannot easily discriminate STMs from LTMs. We can then roughly estimate the “critical” value of L, L, at which P reaches this criterion, with which we obtain, using the parameters for which we run the simulations (S = 7, a = 0.25):The considerations above point to the different values of the critical list length L obtained through the different models. This is to be expected as the different models act on different elements of the network. Model 1 has very limited capacity to constrain latching dynamics, in that interference effects occur already for low values of L. In contrast, Models 2 and 3b yield broadly similar values, whereas Model 3a, acting on the long-range connections, is not affected by interference until much higher L values. This is because in this case, the boost is affecting a subset of the very many NCS(S − 1)/2 tensor connection values (Fig 2E). Note that increasing the strength of the “boost” does not affect the critical list length L (S1–S4 Figs).However, the different manipulations intended to add short-term functionality to the network also affect its regime of operation, such that its ability to spontaneously recall, or latch, is altered, affecting the length of the sequences uttered by the network [19, 23]. The Potts network becomes able to serve STM functions once it undergoes a phase transition from the no-latching phase to the finite-latching phase. For this reason we are also interested in the propensity to latch expressed by each of our models. To investigate this propensity to latch, we first cue the network with one of the memorised patterns, after which we count the total number of transitions that occur until the dynamics stop on their own (Fig 2D). We can see that with Model 2, constraining the dynamics to be among the L items actually enhances the length of the sequences, whereas the opposite is true, at least up to moderate values of L, for Model 3 (and incidentally, for Model 1). This is because for Model 2, the direct manipulation of the adaptive threshold screens its “refractory” effect, affecting also sequence length. The same does not hold for Model 3, in which the adaptive threshold is not manipulated. We deduce that two aspects of Model 2 are relevant as a model of short term memory. First, the “coarseness” of Model 2 yields a limit to the list size that can be effectively enhanced. Second, the basic propensity to latch also falls off with increasing list size, reminiscent of the slowing down of retrieval from memory as the set size increases [4]. Note that the representation of objects has been found to be “enhanced” in working memory tasks [24], likely with higher neural activity in the participating units [25], broadly consistent with Model 2. Therefore, in the remainder of this work, we focus on Model 2.
Results
Can “free recall” by the Potts network model experimental data?
Having discussed three different models for short-term recall, we study in detail Model 2, and focus now on a specific paradigm, free recall. In free recall, participants are given a list of items to remember, and are then immediately asked to recall the items, in the order they wish. Experimental data from decades ago show that the number of items recalled from memory obeys a power law of the list length [8, 26]. To explain this finding and more generally to investigate the putative mechanisms that could hinder recall, a theoretical model for memory recall has been proposed. We refer to this model as the SAM++ model, as it was developed by Sandro Romani, Misha Tsodyks and colleagues [12, 27], with some roots in the SAM theory of Raaijmakers and Shiffrin [28], which however does not envisage the deterministic loops that terminate the search dynamics in SAM++ model. In this model, L STM items are drawn from a virtually unlimited reservoir of (LTM) memory items. Transitions are defined to occur deterministically between items that have the largest similarity; as a consequence, recall trajectories always enter a loop, at which point old items are repeatedly recalled, and no new items are recalled beyond the number R reached with those in the loop. Given such simple transition rules, the power-law dependence can be derived (a similar derivation can be found in S1 Appendix). In a more recent study, this power law dependence has been observed for lists of up to 512 words [12].
If limited by repetitions, the network can recall up to STM items
In contrast to the SAM++ model mentioned above, the dynamics in the Potts network model are not deterministic (we will discuss this point in the subsection on free recall, below), and we hardly ever observe a loop in the network trajectories; hence we cannot apply quite the same stopping criterion to determine how many items have been recalled in a simulation. However we can still compute a measure somewhat similar to R, labeled as , as the number of retrieved patterns until the network repeats one transition—which would be the first element in a loop, given deterministic dynamics. Compared to lnR ∝ 0.5 lnL (see [12]), Mit has a steeper scaling with L, but still sublinear (Fig 3A). Alternatively, we can look at the number Mi1 of retrieved items until the network simply revisits one of those already visited. In contrast to Mit, Mi1 grows now less than a square root of L (Fig 3A). To get at an intermediate behaviour, we could then define a third measure Mi, as the number of recalled items until one item is repeated twice. This somewhat contrived quantity has a behaviour indeed similar to that theoretically expected from the quantity R(L), that is, a slope of 0.5 in a log-log plot (Fig 3B).
Fig 3
Whether limited by repetitions or in duration, Potts free recall approaches a dependence.
The dashed gray line is the theoretical prediction of R in [12]. Both axes are in a log scale. (a): Mit is the number of recalled STM items until one transition is repeated. Mi1 is the number of recalled STM items until one of the visited STM items is revisited. Dotted curves are for slow inhibition (γ = 0.0), dashed curves for fast inhibition (γ = 1.0), and solid ones for the intermediate regime (γ = 0.5). (b): Mi, the number of recalled STM items until one of them is repeated twice. In contrast to the two measures plotted in (a), this quantity approaches a square root dependence with L. (c): Mu, the number of recalled STM items within a given number of latches, g(L), is plotted as a function of L in log-log scale. We consider three different functions for g(L): logarithmic, linear and constant, denoted by dots, squares and diamonds, respectively, for γ = 0.5.
Whether limited by repetitions or in duration, Potts free recall approaches a dependence.
The dashed gray line is the theoretical prediction of R in [12]. Both axes are in a log scale. (a): Mit is the number of recalled STM items until one transition is repeated. Mi1 is the number of recalled STM items until one of the visited STM items is revisited. Dotted curves are for slow inhibition (γ = 0.0), dashed curves for fast inhibition (γ = 1.0), and solid ones for the intermediate regime (γ = 0.5). (b): Mi, the number of recalled STM items until one of them is repeated twice. In contrast to the two measures plotted in (a), this quantity approaches a square root dependence with L. (c): Mu, the number of recalled STM items within a given number of latches, g(L), is plotted as a function of L in log-log scale. We consider three different functions for g(L): logarithmic, linear and constant, denoted by dots, squares and diamonds, respectively, for γ = 0.5.In computing these three measures, we have ignored errors (extra-list items) in order to compare with [12, 27]. Note that errors are not discussed in [12, 27], in which retrieval of extra-list words is simply dismissed as irrelevant. The beauty of their treatment, in fact, stems from the simple question they pose, without getting into how the recall process happens dynamically in the brain and how LTMs affect free recall performance. These questions are those we address here, however.Moreover, we see that whether we consider only very slow or only very fast inhibition, as in previous analytical studies [19, 22], or a more plausible balance of the two, the network behaves similarly in terms of short-term memory function. Based on this observation, hereafter we only concentrate on the balanced, or intermediate regime (γ = 0.5).
If limited by duration, the network can again recall up to STM items
In the free recall experiment conducted in [12], they computed R as the number of correctly recalled words (or sentences), ignoring errors and repetitions. The time allocated to recall started from 1 minute and 30 seconds for L = 4, and was increased by the same amount when the length of the list was doubled. As it is problematic to establish a correspondence between human recall time and simulation time in the Potts network, we define another quantity: we compute the number of correctly retrieved items, ignoring errors and repetitions, Mu, within a given number of consecutive latches, denoted by g(L). Given the stochasticity of the network dynamics in visiting pattern space, the specific choice of g(L) has implications on Mu. We attempt to obtain a reasonable comparison with the results in [12] by writing g(L) = 4log2(L) − 2. We find that this measure has a slope of approximately 0.5 (Fig 3C). However, if g(L) = L, i.e., a linear function of L, Mu has a higher slope. Finally, if we set g(L) to g(Lmax) = 22, with Lmax ≡ 64, i.e. constant and equal to the maximum number of latches in the logarithmic option, then Mu becomes slightly larger for intermediate values of L, suggestive of a drop after hitting a maximum. This again indicates that the Potts network can capture the empirical trend of , provided one adopts a suitable rule for limiting the length of latching sequences. Of course, in experiments limiting the time available to subjects imposes implicit limits also on the errors and repetitions they can make.
Free recall of nodes on a 2D grid also shows a dependence
That the various M measures obey quasi-square-root functions of L may be partially understood by considering a random walk in pattern space, with equally probable visits to each of the patterns (S5 Fig) [29, 30]. Inspired by this observation, we have designed simple experiments in which subjects are asked to remember a random trajectory on a 2-dimensional grid (Fig 4A). We then asked participants to freely recall the positions of the presented dots by clicking on their positions on the grid.
Fig 4
Free recall of locations in a 2D grid also shows an approximate dependence.
(a): The 2D grid used in the free recall experiment. Yellow dots show one example of stimuli with L = 8. (b): MR, the average number of correctly recalled locations in our experiment, is shown by the height of pink bars in a log-log scale. The distance from the bar to the dot of the same colour corresponds to the standard deviation of the mean. Results of 40 participants are pooled together. The same quantity MR is computed, from simulating Model 2, as the number of correctly retrieved STM items within a given number of consecutive latches set as 2L − h(t|L), where h(t|L) is the number of correctly recalled STM items up to that point in time (blue bars). The dashed gray line is the theoretical prediction of R in [12]. Both results, from our experiment and the Potts network, show an approximate trend.
Free recall of locations in a 2D grid also shows an approximate dependence.
(a): The 2D grid used in the free recall experiment. Yellow dots show one example of stimuli with L = 8. (b): MR, the average number of correctly recalled locations in our experiment, is shown by the height of pink bars in a log-log scale. The distance from the bar to the dot of the same colour corresponds to the standard deviation of the mean. Results of 40 participants are pooled together. The same quantity MR is computed, from simulating Model 2, as the number of correctly retrieved STM items within a given number of consecutive latches set as 2L − h(t|L), where h(t|L) is the number of correctly recalled STM items up to that point in time (blue bars). The dashed gray line is the theoretical prediction of R in [12]. Both results, from our experiment and the Potts network, show an approximate trend.Clearly, the parameters of the experimental protocol can be expected to affect recall, including the amount of time allocated for recall. However, in our experiment, participants only need to click on the correct locations (as opposed to typing in the words they recall [12]), and setting a fixed recall time may seem ad hoc. As an alternative, and to further explore the validity of latching dynamics as a model for this experiment, we give participants a limited number of clicks per trial, set as 2L − h(t|L), where h(t|L) is the number of correctly recalled dots up to that point in time. Then we compute MR, defined as the number of correctly recalled dots for a given L ignoring errors and repetitions, and compute the same measure from simulations with the Potts network (see Methods for a description of the experiment).We find a reasonable agreement between the performance of the Potts network and human subjects in our experiment, where both show a slope of approximately 0.5 (Fig 4B). This suggests that latching dynamics capture some aspects of the underlying neural mechanisms of free memory recall, related to the random walk nature of the trajectory, although the exact details depend on the paradigm.
If limited by errors, the network cannot recall beyond its STM capacity
The measure Mcorr was introduced and discussed above to compare three different models. Here we compute the same quantity with a slight modification; in order to compare with our experimental data, we consider sequences of variable length that depends both on list length L and time. We consider again lengths g(L) = 2L − h(t|L), where h(t|L) is the number of correct STM items already retrieved; within this sequence we count the number of correctly retrieved STM items up to the first error or repetition. We compute this quantity for several values of Δθ in the Potts network. The behaviour of with respect to L is qualitatively similar to that of the experimental curve for a broad range of Δθ values (see Fig 5A). For all values of Δθ, saturates reaching a maximum that is similar to that of the experimental data, of around 8 items correctly recalled. Exceptions are at the two extremes: too small and too large values lead to lower capacity of the Potts model, below 7 items.
Fig 5
An error-limited measure of recall has a maximum value.
Two measures, and MR, are shown for several values of Δθ, coded by colours. Black dotted curves are the experimental results of free recall of locations in a 2-dimensional grid. (a): has a maximum value. It is the number of recalled STM items until the network either revisits one of the already-recalled STM items or visits one of the LTM items, but within a given number of latches − 2L − h(t|L), where h(t|L) is the number of correctly recalled STM items up to that point in time. (b): MR shows a scaling behaviour. MR is the number of recalled STM items, ignoring repetitions and errors, within a given number of consecutive latches, again 2L − h(t|L).
An error-limited measure of recall has a maximum value.
Two measures, and MR, are shown for several values of Δθ, coded by colours. Black dotted curves are the experimental results of free recall of locations in a 2-dimensional grid. (a): has a maximum value. It is the number of recalled STM items until the network either revisits one of the already-recalled STM items or visits one of the LTM items, but within a given number of latches − 2L − h(t|L), where h(t|L) is the number of correctly recalled STM items up to that point in time. (b): MR shows a scaling behaviour. MR is the number of recalled STM items, ignoring repetitions and errors, within a given number of consecutive latches, again 2L − h(t|L).The saturation behaviour, and hence the notion of memory capacity, again contrasts with the scaling behaviour approximated by the various measures such as Mi, Mu and MR. This contrast holds irrespective of the values of network parameters used in simulations. Indeed the scaling behaviour of MR is almost independent on the value of Δθ except when it is too large, Δθ = 0.6 (Fig 5B). Furthermore, we find that the two contrasting behaviours—scaling and saturation—are fairly robust to change of network parameters such as Δθ, S and a (S6 and S7 Figs).“Performance” therefore depends very differently on L, if recall is taken to be terminated by errors, i.e. by the erroneous recall of an item that is not in STM. Thus, while if ignoring errors the notion of STM capacity appears irrelevant (given the scaling behaviour of the various quantities discussed above), it becomes quite relevant if errors are considered to be critical in the task.In summary, we have shown that whether we get scaling or saturation in STM performance depends on the specific metric we use to measure it, both in the Potts network, endowed with an STM mechanism and in our experiment. In free recall experiments, performance has often been quantified through the MR index, thereby ignoring errors. This scaling behaviour appears to hold even up to 512 items [12]. In contrast, taking our experiment as an example, we have shown that if errors are considered critical, in our case through the Mcorr measure, then the performance of human subjects actually expresses a saturation at about 8 items. In our model, that expresses a similar behaviour, this saturation is brought about by the interference from long-term memories.
Serial recall
Can the Potts model endowed with short term memory express also behaviour similar to serial recall? This is a paradigm very similar to free recall, but with a crucial difference. Here, participants are instructed to recall items in the same order as they have been presented, making the task more difficult and, for a model, to rely on random walk dynamics would appear to be counterproductive. Clearly, the network model requires some extra ingredient to produce ordered sequences.First, in light of the literature pointing at how STM span depends on the nature of items being remembered [14, 15, 31, 32], we have performed serial recall experiments with three different types of items, but within the same general paradigm. We asked participants to observe and repeat sequences of stimuli presented to them on the screen—either digits or spatial locations on a 2-dimensional grid (Fig 6A), and varied the time of presentation of the stimuli in the observed sequence. There were two conditions for the spatial locations, referred to as Locations and Trajectories: in the Locations condition, considered to involve only “discrete” items, the six chosen locations around the centre of the grid were highlighted in any order, while in the Trajectories condition, every next location was one of the six consecutive locations around the previous one, thus suggesting a “continuous” trajectory. Contrary to the free recall experiment reported above, in this task participants had to recall the material in the correct order, otherwise the trial was dismissed as incorrect. Participants started with short sequences of length 3; if they recalled them correctly in at least 3 out of 5 trials, the sequence length increased, until a memory capacity limit for this stimulus type and presentation time was reached. Fig 6B shows the capacity for serial recall in this task (see Methods for how we computed the memory capacity).
Fig 6
Short-term memory capacity for serial recall does not markedly depend on stimulus type.
(a): The 2D grid used in the serial recall experiment. Dots are presented sequentially as shown by the highlighted dots here (L = 8). (b): Memory capacity for serially presented stimuli for different presentation times: bars correspond to the average capacity across participants, while the distance from the bar to the dot of the same colour corresponds to the standard deviation of the mean. We performed the experiment for three different stimulus types, shown in different colours.
Short-term memory capacity for serial recall does not markedly depend on stimulus type.
(a): The 2D grid used in the serial recall experiment. Dots are presented sequentially as shown by the highlighted dots here (L = 8). (b): Memory capacity for serially presented stimuli for different presentation times: bars correspond to the average capacity across participants, while the distance from the bar to the dot of the same colour corresponds to the standard deviation of the mean. We performed the experiment for three different stimulus types, shown in different colours.Our experiment yields two main results (Fig 6B). The first is that the type of stimulus does not affect the recall probability, except for a slight disadvantage in the discrete Locations condition, suggesting a universal mechanism for recall independent of the material, which manifests itself at the systems level. The second, which is more pronounced, is the effect of presentation time per stimulus, that, when shortened, makes it more difficult to correctly remember and repeat the longer sequences, suggesting a disadvantage at the encoding stage. We ask whether latching dynamics in the Potts model can reproduce this finding. Given that our results, as well as those from other studies [4], show limited dependence on stimulus material, hereafter we only consider the result with digits in order to establish a comparison with our model.We used Model 2 (lower adaptive threshold for items held in STM) to constrain the dynamics into a subset of L = 6 patterns intended as the 6 digits of our experiment. In addition to that, we introduced heteroassociative weights, similar to Model 3, to provide the sequential order of presented digits (see Eq (24) in Methods).We find a good agreement between our experimental data and the model (Fig 7). In addition, we find that human subjects perform better if the to-be-memorised digit series include ABA or AA (Fig 7A and 7C), in line with the notion that the repetition of an item aids memory [33-36]. Such sequences are not produced by our model, due to firing rate adaptation and inhibition preventing the network from falling back onto the same network state for time scales of the order τ2.
Fig 7
Serial recall of digits by human subjects and the Potts model.
(a): Proportion of correct trials in the serial recall task with digits. Data for all subjects (n = 36) are pooled together. Colour codes for presentation time (in ms). Dots are for sequences without repetitions like AA and ABA and circles are for all sequences. (b): Proportion of correct subsequences in a latching sequence of the Potts model. Colour codes for values of the heteroassociative strength λ, that hard-codes transitions into the weights. Circled (dotted) curves correspond to simulations with the boost Δθ = 0.1 (0.2). (c): Memory capacity computed from the curves of (a), (see Methods). (d): Recall capacity computed from latching sequences of the Potts model is shown by the same colour-coding as in (b). (e): The quality of latching (see Eq (26)), a measure of the discriminability of the individual memories composing a sequence, is shown for different values of λ and Δθ. (f): Proportion of correct subsequences in a latching sequence of the Potts model for Δθ = 0.1, λ = 0.01. The solid curve is for congruent instructions only and the dashed curve is for a shuffled version of intrinsic sequences.
Serial recall of digits by human subjects and the Potts model.
(a): Proportion of correct trials in the serial recall task with digits. Data for all subjects (n = 36) are pooled together. Colour codes for presentation time (in ms). Dots are for sequences without repetitions like AA and ABA and circles are for all sequences. (b): Proportion of correct subsequences in a latching sequence of the Potts model. Colour codes for values of the heteroassociative strength λ, that hard-codes transitions into the weights. Circled (dotted) curves correspond to simulations with the boost Δθ = 0.1 (0.2). (c): Memory capacity computed from the curves of (a), (see Methods). (d): Recall capacity computed from latching sequences of the Potts model is shown by the same colour-coding as in (b). (e): The quality of latching (see Eq (26)), a measure of the discriminability of the individual memories composing a sequence, is shown for different values of λ and Δθ. (f): Proportion of correct subsequences in a latching sequence of the Potts model for Δθ = 0.1, λ = 0.01. The solid curve is for congruent instructions only and the dashed curve is for a shuffled version of intrinsic sequences.The heteroassociative component of the learning rule (Eq (24) in Methods) provides “instructions” to the network regarding the sequential order of recall, allowing it to perform serial recall (this is to be contrasted with the model with a purely autoassociative learning rule, performing free recall). The strength of such instructions is expressed through the parameter λ. We find that this parameter plays a role similar to that of presentation time in our experiments; increasing it enhances performance, just as increasing the presentation time increases the performance of human subjects (Fig 7). However, values of λ that are too large again make performance worse and deteriorate the quality of latching (Fig 7E). The dynamics becomes a stereotyped sequences of patterns, see S8 Fig, without really converging towards attractors, and the sequence itself is progressively harder to decode. Therefore, the most functional scenario is when the heteroassociative instruction acts as a bias or a perturbation to the spontaneous latching dynamics rather than enforcing strictly guided latching in the Potts model. This is in sharp contrast with the mechanism for sequential retrieval envisaged in the model considered in [37], where the heteroassociative connections are the main and only factor driving the sequential dynamics; in that case, without it, there are no dynamics but rather, at most, the retrieval of only the first item. The effect of lower adaptive threshold (expressed by Δθ) on latching sequences is to constrain the dynamics to a subset of presented items among p patterns, but values of Δθ that are too high degrade the performance as well as the quality of latching (Fig 7B, 7D and 7E).As mentioned above, the Potts model produces latching sequences even without any heteroassociative instructions. This means that the free transition dynamics of the model may or may not coincide with the “instructions” provided by the heteroassociative weights. Then one question naturally arises. How does the congruity between spontaneous, endogenous sequences and instructed ones affect the performance of the model? To see this effect, we obtain some intrinsic latching sequences by running simulations with λ = 0; from these latching sequences, we generate a set of instructions for the serial order. These instructions are congruous, inasmuch as they reproduce latching sequences emerging without any heteroassociative instructions. Then we compare the performance for these congruous instructions with those of incongruous instructions, which we obtain by shuffling the congruous ones. We find that the capacity of the model increases by as much as 1 item for the congruous case relative to the incongruous case (see the legend in Fig 7F).These results together with those from the previous two sections indicate that intrinsic latching dynamics, similar to a random walk, can serve short-term memory (e.g., they can be utilised by free recall). Furthermore latching dynamics can also serve serial recall, if supplemented by biases that modify the random walk trajectory; the modification (or perturbation) should be a quantitative one, which biases the random walk character of the trajectories, rather than an all-or-none, or qualitative one, that inhibits it. This is consistent with our recent experimental result, to be reported elsewhere, where “guided” serial recall leads to poorer performance than a non-guided control.
The trajectories in free recall
In previous sections we saw a reasonable agreement between some experimental measures and those extracted from simulating the Potts model. This agreement essentially results from two factors: first, the Potts model can produce a sequence of discrete activity patterns even though its governing equations are continuous at the microscopic level; and second, the dynamics of the Potts model visit the patterns in a random-walk like process. We now examine the sequences more closely to see what factors influence latching sequences and how the network wanders around the landscape of memorized patterns.We first ask ourselves: once the network is cued with a given pattern, what elicits the retrieval of the next one? In previous studies [19, 22], it was shown that transitions occur most frequently between highly correlated patterns, when the Potts model serves a long-term memory function. We confirmed that this is also the case when the Potts model serves a short-term memory function, as in the current study (S9 Fig). Indeed, the larger the average correlation of one pattern with all other patterns in STM, the more often it is visited by the network (S10 Fig). This result is consistent with a recent experimental study on how memorability of words affects their retrieval in a paired-associates verbal memory task [38].Next we probe the flow of information in the latching sequences of the STM model embedded in the Potts neural network by computing the normalised mutual information between two patterns as a function of their relative separation in a latching sequence, z (see Methods for details). We find that the mutual information is decreasing rapidly with respect to z, with a quasi-periodic modulation, reminiscent of the temporal profile of intensity of a damped oscillator (Fig 8A). The periodic modulation is much more evident for L = 16 than for L = 64; within the range of z we have considered, we see a peak at z ≈ 4.5 for γ = 0.0 and at z ≈ 3.5 for γ = 0.5, but we also see the second peak at z = 6 in addition to the first peak at z = 3 for γ = 1.0 (Fig 8A). The second peaks for γ = 0.0 and γ = 0.5 would be located at z ≈ 9 and z ≈ 7, respectively. The quasi-period of the “damped oscillation”, ζ, is twice the z–value of the first peak, therefore, decreasing with increasing γ, starting from ζ ≈ 9 at γ = 0.0 until ζ ≈ 6 at γ = 1.0. For L = 64, it is as if the damping ratio is too high to observe any periodicity.
Fig 8
Damped waves in pattern space.
(a): Mutual information as a function of the relative separation of two patterns in a latching sequence, z. The ordinate is the mutual information I(z) ≡ I(μ, ν|z) (see Methods for details) divided by the entropy H. Note the logarithmic scale of the y–axis. Parameters are Δθ = 0.3, L = 16 (64) for the curves marked with dots (open squares), w = (0.4, 0.8, 1.0) for γ = (0.0, 0.5, 1.0). (b): Distribution of distance, d, between two patterns that have the relative separation z in a latching sequence for L = 16, γ = 0.5 and w = 0.8. The black, vertical line indicates the mean value of d across all p patterns. The solid black curve is the PDF of d among all possible pairs between L patterns in STM. (c): Histograms for the visiting frequency of patterns in STM, given one pattern is recalled. The remaining L − 1 = 15 patterns are arranged along the x–axis by their visiting frequency at the next position of the currently retrieved pattern in a sequence (z = 1), giving three groups x1, x2 and x3 of 5 patterns each. Each group is further arranged symmetrically along the y–axis, with the most frequent pattern on the midline (y3). Visiting frequency is double-encoded by the height and colour of bars. The lonely, magenta bar behind the group x1 shows the visiting frequency of the currently recalled pattern once it returns at the position z.
Damped waves in pattern space.
(a): Mutual information as a function of the relative separation of two patterns in a latching sequence, z. The ordinate is the mutual information I(z) ≡ I(μ, ν|z) (see Methods for details) divided by the entropy H. Note the logarithmic scale of the y–axis. Parameters are Δθ = 0.3, L = 16 (64) for the curves marked with dots (open squares), w = (0.4, 0.8, 1.0) for γ = (0.0, 0.5, 1.0). (b): Distribution of distance, d, between two patterns that have the relative separation z in a latching sequence for L = 16, γ = 0.5 and w = 0.8. The black, vertical line indicates the mean value of d across all p patterns. The solid black curve is the PDF of d among all possible pairs between L patterns in STM. (c): Histograms for the visiting frequency of patterns in STM, given one pattern is recalled. The remaining L − 1 = 15 patterns are arranged along the x–axis by their visiting frequency at the next position of the currently retrieved pattern in a sequence (z = 1), giving three groups x1, x2 and x3 of 5 patterns each. Each group is further arranged symmetrically along the y–axis, with the most frequent pattern on the midline (y3). Visiting frequency is double-encoded by the height and colour of bars. The lonely, magenta bar behind the group x1 shows the visiting frequency of the currently recalled pattern once it returns at the position z.This behaviour is related to how the Potts network “freely” forages the landscape of the embedded attractors. We visualize this nontrivial behaviour for γ = 0.5, where we not only see a kind of damped wave that “propagates” along the y–axis with the variable z as an effective “time”, but also see the “reflection” of the wave around z ≈ 3.5 (Fig 8C).What causes these characteristics of the latching trajectories of the Potts model? To answer this question, we define a quantity, called d, which is an index of “semantic” distance between two patterns in their representational space. We defined a distance between two patterns μ and ν as follows.
where Cas and Cad measure the correlation between two patterns (see Eqs (27) and (28) in Methods).We consider the distribution of d(μ, μ), the distance between two patterns that are separted by z latches in a latching sequence, for 6 values of z (Fig 8B). At z = 1, latching occurs mostly between highly correlated patterns as expected, where the higher correlation is expressed by lower d. At the second step in a latching sequence (z = 2), patterns that have higher d values than the average value show a comparable proportion of the probability density curve relative to patterns with lower values of d. Then the proportion of higher d values is much larger than the proportion of lower d values for z = 3 and z = 4. This means that the network prefers to visit those patterns that are less correlated with the initially retrieved one at the third and fourth step. So we can say that the network reaches the most “distant” pattern from its “initial” pattern around z = 3.5, which is the “reflection” point of the wave (Fig 8C). As z increases further to reach 6, the density curve is getting closer to the curve for z = 1, thus approaching the periodicity mentioned above. This periodicity is confirmed by S11 and S12 Figs.These results indicate that latching trajectories by Potts networks have a quasi-random walk character, though biased by correlations between patterns in their representational space. This is consistent with earlier applications of latching dynamics to semantic priming [13].
Discussion
The Potts model offers a plausible cortical framework to discuss aspects of memory dynamics, without losing too much of the clarity afforded by simpler non-neural models. Indeed, a major difficulty with network models of memory storage in the human cortex, which have attempted to reflect its dual local and long-range connectivity [17, 39] by articulating interactions at both the local and global levels, is that their mathematical or even computational tractability usually has required ad hoc assumptions about memory organization. For example, the partition of memory items in a number of classes, in each of which memories are expressed by the activity of the same cortical modules [40]—which makes it awkward to use such a network model to analyse the free or serial recall of arbitrary items. On the other hand, more abstract models have provided brilliant insight [27] which is hard, however, to relate to neural variables and neural constraints. By subsuming the local level into the dynamics of individual Potts variables, the statistical analysis can focus on the cortical level, what is effectively a reasonable compromise.The (global) cortical level is in particular the one to consider in assessing short-term memory phenomena, in which interference from widely distributed long-term memories plays a central role. Experiments with lists of unrelated words are a prime example [13]. The free energy landscape of the Potts model provides a setting for quasi-discrete sequences of states, with properties that turn out to be similar to those of random walks. This happens, however, only within a specific parameter range, and only to a partial extent, so that often one has in practice several intertwined sequences, with simultaneous activation of multiple patterns, as well as pathological transitions, all characteristics with potential to account for psychological phenomena, and which are lost in a more abstract purely symbolic model. We have thus discussed three generic neural mechanisms that may contribute to restrict the random walk, approximately, from p to L items. Although not exclusive, we have argued that the second such mechanism is the one most relevant to account for the recall of list of unrelated items.To model the recall of ordered lists, an additional heteroassociative mechanism can be activated, which biases the random walk, but again approximately, resulting in frequent errors and limited span. We have observed that, at least in the Potts network, if the heteroassociations, which amount to specific instructions, dominate the dynamics, the random character is lost. With it we lose the entire latching dynamics—which cannot be harnessed to just passively follow instructions.In summary, a Potts network can generate quasi-discrete sequences from analog processes, with the possibility of errors inthe “digitalisation” into a string of discrete states, one at a timethe restriction to L out of p item in LTMthe order, both in the specific sense of serial order, and in the generic one of avoiding repetitions.These possibilities for error reflect weaknesses of latching dynamics as a mechanism for short-term memory expressed by a Potts network, and at the same time underscore the value of the mechanistic model, inasmuch as similar “flaws” crop up in the phenomenology. The analysis of such flaws can lead to refinements of the model.Thus, point 2, the difficulty of restricting latching dynamics to a subset of all the long-term memory representations, is made even more severe in paradigms that involve multiple subsets. For example, in analyses of the Phonological Output Buffer (POB) the hypothesis has been considered of mutiple POBs, one holding simple phonemes, one function words, one numerals, etc., conceptually as sort of separate drawers, or mini-stores [41]. If one accepts the evidence of a common substrate for working memory and long-term memory representations [42], one cannot resort to different “drawers”, i.e., different scratchpads or the like, where to temporarily hold the items from distinct subsets, and this makes enforcing the restriction more difficult. Likewise, one cannot regulate the correlation between the long-term representations, as one could do if new ad hoc representations were temporarily set up. These constraints can result in intrusions, a simple form of false memory, e.g. by items that are strongly semantically associated to items in a short-term memory list [43], or by items in prior lists [44]. It would be tempting to pursue a fully quantitative study of these phenomena [45] to try and extract constraints, for example, on the time course of the “boost” that models STM in the Potts network.In relation to point 3, latching dynamics are intrinsically stochastic in nature, even in the absence of microscopic noise, because of the heterogeneity of the underlying microscopic states. With randomly correlated representations, trajectories among items are effectively random, with only a tendency to avoid close repetitions, as a result of the adaptation-based mechanism. Interestingly, a tendency to perceive random processes as less prone to repetition than they really are is a hallmark of human cognition [46]. Beyond the vanilla version of the model, however, it is rather trivial to incorporate e.g. adjustments of the time course of the boost, to produce primacy and recency, or adjustments of the correlations between pairs of representation to produce preferred transitions. What is more interesting and still lacking, to our knowledge, is again a quantitative study of the degree of randomness of the recall process, in the context of remembering lists for example—a study made inherently difficult by the need to use novel items in a within subjects design. The same need effectively prevents the analysis of the recalled string at the single neuron level: even when recording the activity of neurons in awake patients, only generic forms of selectivity can be reliably studied, e.g., that expressed by putative “time” cells [47]. Interestingly such a study has been recently carried out in rats, pointing at the random walk character of the spatial trajectories they recall shortly after experiencing them [29]. While a similar approach cannot easily be extended to humans, to probe the dynamics of individual neurons, the Potts model can help interpret evidence at the integrated cortical level.It is its fallibility in the production of a simple string of items, however, where the Potts network offers crucial insight beyond that provided by simpler and more abstract models, in which the digitalisation of a string is a priori given. Latching dynamics can involve partially parallel strings, items incompletely recalled simultaneously with others, periods of utter confusion, stomping attempts. Statistically, they are all observed with prevalence determined by the various parameters. These flaws in the analog-to-digital transduction of the Potts model may be useful in the interpretation of electrophysiological data. One basic question in this domain is: can two items be simultaneously active in working memory? On this question, experimental evidence has been difficult to obtain, because a process that appears to involve two items active together, might in fact rapidly alternate between them. Recently, however, the genuinely concurrent activation of two items has been reported with a model-based analysis of EEG data [48]. In that study, holding on to the two items meant better performance in the task, so it reflects a capability, not a flaw of the short-term mechanism. If extended to sequences of endogenously generated states, as the Potts model indicates would occur, at least in certain regimes, it would mean that not only the focus of attention when performing a similar task need not be unique, but also that parallel streams of thoughts can be entertained along partially interacting trajectories. This could be applied to interpret electrophysiological measures of mind wandering dynamics [49], with significant implications for our intuition about the unity of consciousness [50].
Methods
We studied the latching dynamics of the Potts network by extensive computer simulations. In a simulation the network is first initialized by setting all variables at their equilibrium values. Then we cue the network with one of the memorized patterns, remove the cue and let the dynamics proceed. Simulations are terminated if the network shuts down into a globally stable null attractor (in which all units are inactive) or if the total number of updates reaches 105.
The Potts network as a model for short-term recall
The Potts network has been studied so far as a model of long-term memory; but it can also serve, with minimal modifications, short-term or working memory. It suffices to strengthen a few memory items, or sequences of items, by increasing the value of some pre-existing parameter, to effectively bring the network across a phase transition, as indicated in Fig 2. Evidence and arguments supporting the model of short-term memory as an activated portion of long-term memory can be found in [7].The types of modifications we consider, in this study, all implement the assumption that, when a subject is performing a task of immediate recall, the attractors corresponding to the presented items have been facilitated at the encoding stage. We can visualize them as becoming wider and deeper in their basins. At the recall phase, then, we interpret that an item has been recalled by the Potts network if its activity becomes, at least for a brief time, most correlated with the corresponding attractor, among all LTM items. The facilitation of attractors for STM items can be done by changing distinct parameters of the network. We propose in Models three different models for short-term memory function.
The Potts model for serial recall
We use Model 2 to approximately constrain the dynamics to a subset of L0 patterns, for example the 6 digits of our experiment. We have p = 200 patterns in long-term memory, among which we give a Δθ boost to L0 = 6 patterns, indicated as 1, 2, 3, 4, 5, 6. In addition to the autoassociative connections between Potts units given by Eq (1), we introduce heteroassociative connections to mimic the sequential order of the items presented in the experiment; we randomly pick L items among the 6 items (1, 2, 3, 4, 5, 6), allowing repetitions. When L = 6, for example, it can be 2 → 4 → 3 → 2 → 5 → 1. But we do not include sequences that have a subsequence like AA or ABA because the Potts model cannot really express such sequences (they occasionally appear in the dynamics, but only when the transition from A to B is incomplete or anomalous). Sequences without subsequences of the form ABA and AA are favoured by the Potts network. So, we prepare a set of 80 sequences that do not include any subsequence of the form ABA and AA, for a given value of L, with L = 3, 4, …, 10. If we denote a sequence of this set as I1, I2, …, I, then the model for serial recall is determined by the following equations
Definition of relevant quantities
The quality of latching is evaluated by means of d12 − Q. d12 is the difference between the largest overlap and the next largest one, averaged over time and over so called quenched variables [22], while
is the average overlap with the next L patterns, since . is the overlap of the network activity with a pattern μ and μ1, …, μ are the L − 1 patterns having largest overlaps excluding the maximum overlap. This quantity is a kind of measure on how “condensed”, i.e., partially recalled, the non-recalled patterns are.The correlation between patterns is measured by two quantities [18, 19],
The average values of Cas and Cad over different realizations of randomly-correlated patterns are given byMutual information I(z) is computed by
where P(μ, ν|z) is the joint probability of observing pattern μ at the position n and observing pattern ν at the position n + z in a latching sequence, with n can be any integer between 1 and the length of the latching sequence. P(μ) is the marginal probability of observing pattern μ at any position of the latching sequence. Mutual information is then normalised by entropy,
Network parameters
The network parameters used in this study are set as in S1 Table, if not specified explicitly.
Experiments of free recall and serial recall
Both experiments were conducted online, with participants recruited through https://www.prolific.co/.
Serial recall
The 36 participants were instructed to watch a sequence appear on the computer screen and repeat the sequence just after, by clicking on the screen. They had to repeat sequences of L stimuli (L starting from 3). In each of the conditions, they had 5 trials for each length L, with L incremented by one until 3 out of 5 trials were incorrect; the last L is then taken as the limit capacity for this participant in this condition. For each participant the sequences were of all three stimulus variants:—(D) Digits out of {1, 2, 3, 4, 5, 6} on a black screen, presented one at a time—(L) Locations on a hexagonal grid highlighted one by one, out of 6 around the central (blue) dot—(T) Trajectories on the same hexagonal grid: now each consecutively highlighted dot is one of 6 neighbors of the previous one (as shown in Fig 6A, the first one is always one of the six around the center). Each stimulus was presented for one of the durations (in separate blocks): 400ms, 200ms, 100ms. First always came the 400ms training session, then either 200ms or 100ms (balanced), and then the remaining duration. Presentation order was balanced across duration and stimulus material. In additional experiments, landmarks on the grid were used as well as intermediate presentation times, but no significant effect on the recall performance was observed.In order to measure the memory capacity in this serial recall task, we first plot the proportion of correct trials as a function of L either for each participant in Fig 6B or for the pooled data across all participants in Fig 7A. Although the minimum value L we used was 3, we added two “data points” by hand to the proportion-P(L), setting it to 1 (i.e., a putative 100% for L = 1 and L = 2). We then compute the memory capacity as the simple sum,
where Lmax is the maximum value of L used in the experiment. This measure is usually referred to as Area Under the Curve or AUC [51].
Free recall
The same hexagonal grid as in serial recall is used (Fig 4A). In this experiment, the sets of stimuli were presented all at once, and the participants (N = 40) were instructed to repeat as many as they could recall, by clicking on the dots in the grid. For each set size L in {4, 6, 8, 12, 16, 24, 32}, the participants had 5 trials to do, each trial allowing for 2L—(number of correctly recalled items) clicks. For example, if participants correctly clicked 3 correct dots out of 4 times in a trial with L = 4, they had another chance, to reach the fourth correct dot, as 2L − 3 = 5. A set of size L was presented for log2(L) seconds.
ΔMcorr is shown for several values of Δw from simulating Model 1.
The abscissa is the number of items in STM, L, in a log scale. The ordinate is ΔM ≡ Mcorr(Δw) − Mcorr(0), where Mcorr is the number of recalled STM items until the network either repeats an already-visited item or (mistakenly) retrieves one of the LTM items. Left: w = 1.0, right: w = 1.1.(PDF)Click here for additional data file.
ΔMcorr is shown for several values of Δθ from simulating Model 2.
Details as in S1 Fig.(PDF)Click here for additional data file.
ΔMcorr is shown for various several values of ΔJ from simulating Model 3a.
Details as in S1 Fig.(PDF)Click here for additional data file.
ΔMcorr is shown for various several values of ΔJ from simulating Model 3b.
Details as in S1 Fig.(PDF)Click here for additional data file.
The quantity R is shown as a function of L, obtained from simulations with SAM++ model ([12, 27]) and with the Potts network endowed with long-term memory function.
This quantity (R), which is the number of visited STM items until the search process enters a loop, is well-defined only in the case of symmetric similarity matrix. In other cases the quantity R is ill-defined; a closed loop is hardly ever observed in search process, so we compute, instead, Mi1, which is the number of visited STM items until the network revisits one of the already-visited items, as a surrogate for R. The blue curve with squares is R(L) obtained from simulating SAM++ model with random symmetric similarity matrices (1000 simulations). The blue curve with circles is R(L) obtained from simulating SAM++ model with random non-symmetric similarity matrices (10000 simulations). In both cases elements of similarity matrices are drawn from a uniform distribution between 0 and 1. In the latter case, the degree of symmetry is 0.5 on average. The green line with diamonds is R(L) obtained from simulations of the Potts model without short-term boost in the intermediate inhibition regime (γ = 0.5, w = 1.4). We randomly pick L out of p = 200 patterns and treat them as if they were STM items. The solid black line is from the numerical evaluation of Eq (1) in S1 Appendix, which is derived from an equal-probability assumption. All lines shown here have a slope of approximately 0.5.(PDF)Click here for additional data file.
Behaviour of Mi(L) is fairly robust to the values of Δθ.
Mi(L) is plotted for several values of Δθ from simulating Model 2. Mi is the number of recalled STM items until one of them is repeated twice.(PDF)Click here for additional data file.
Mi, Mcorr and MR remain qualitatively the same with respect to changes in S and a.
These quantities remain qualitatively the same with respect to changes in S and a, as long as latching dynamics are stably maintained under these changes. Mcorr is the number of recalled STM items until the network either revisits one of the already-recalled STM items or visits one of the LTM items, but within a given number of latches − 2(L − h(t|L)), where h(t|L) is the number of correctly recalled STM items up to that point in time. MR is the number of correctly retrieved STM items within a given number of consecutive latches set as 2(L − h(t|L)), ignoring errors and repetitions.(PDF)Click here for additional data file.
Too high values of λ lead to faltering latching dynamics.
In serial recall by the Potts model, too high values of λ, relative strength of heteroassociative connections to the autoassociative ones, lead to faltering latching dynamics. Two example sequences are shown, for the same parameter values: ω = 1.0, γ = 0.5, Δθ = 0.1, λ = 0.05. Each colour corresponds to a different pattern. The proportion of simulation in which latching completely fails, as in the right panel, increases with λ.(PDF)Click here for additional data file.
Scatter plot with Cas and Cad on the two axes.
See Eqs (27) and (28) in Methods for definitions of Cas and Cad. Each data point (obtained from Model 2 for L = 64) indicates, for enhanced clarity, an average over 3 pairs of patterns. Crosses (open circles) represent correlations averaged over 3 most (least) frequent pairs, whose relative positions are determined by z in a latching sequence. Horizontal and vertical dashed lines indicate the average values of Cas and Cad over all patterns. At the first step (z = 1), latching occurs most frequently between highly correlated patterns, in agreement with previous studies on long-term memory. At the third step, the trend is reversed.(PDF)Click here for additional data file.
Patterns that are visited more frequently seem to be those that share a larger number of active units with a larger set of patterns, reflected in the correlation matrix.
(a): Re-ordered transition matrix for p = 200 and L = 16 for one set of patterns, ordered according to the visit frequencies of each pattern in that data set. The matrix of transition probability has rows—where the network latches from, which in turn is just the probability of appearance of each pattern—that look roughly similar to the average row (with fluctuations), while the columns—where the network latches to—are very different from each other, from the heavy ones on the left to the light ones on the right. (b): C matrix (see Eq (27) in Methods for its definition), again ordered in the same way as in (a). The diagonal has been set to 0 artificially, in order for off-diagonal values to be more visible. (c): Mean correlation of each pattern in STM with all the others in STM, y, versus its visit frequency f for p = 200 and L = 16. Numbers indicate the pattern indices (16 of them).(PDF)Click here for additional data file.
Probability density of d(μ, μ) shows the quasi-periodic evolution with respect to z.
Probability density of d(μ, μ) (see Eq (22)) is divided by the probability density of d(μ, ν) for all possible pairs among L patterns in STM from simulating Model 2. From z = 1 to z = 6, we can see the quasi-periodic evolution of the probability density function. Parameters are w = 0.8, γ = 0.5, L = 16, Δθ = 0.3.(PDF)Click here for additional data file.
Visiting frequency of a pattern at the position n + z as a function of d(μ, μ) and d(μ, μ) from simulating Model 2.
Colour indicates the visiting frequency. From the upper left panel to the lower right one, we can see that the brightest spot (most frequent visits) rotates counter-clockwise. Dashed black lines indicate the average value across all pairs in STM on the corresponding axis. Parameters are w = 0.8, γ = 0.5, L = 16, Δθ = 0.3.(PDF)Click here for additional data file.
Mutual information is plotted up to z = 9 for confirming the peoriodicity stated in Fig 8.
(PDF)Click here for additional data file.
Deriving scaling law under the assumption of equal visits.
(PDF)Click here for additional data file.
Parameter values used in simulations.
(PDF)Click here for additional data file.8 Mar 2021Dear Mr. Ryom (and Allesandero),Good news and bad news. The good news is that I really liked the explanation you provided of why our short term memory is so bad. The bad news is that, in my opinion, the paper was written in a way that made it next to impossible to figure out what was going on. I think I did figure it out, but mainly because I had an idea before starting. Without that, I would have been completely lost. And, given that my understanding of the paper was based mainly on priors, I may be totally wrong. So my main comment is: completely rewrite the paper, with a semi-naive reader in mind. I personally try to write my papers so that a math-competent person with no background in the particular subject can follow it relatively easily.I'll give some specific comments below, but they're only to provide a flavor; I stopped reading after a while, as I got more and more lost. I believe you'll be instructed to provide a point-by-point rebuttal to these comments. You don't have to do that. Instead, all I ask is that the paper is easy for me to read. When that's the case, I'll send it out to reviewers. In its current state it's not ready.Specific comments:1. Probably the most serious problem is that you never actually say what's going on. Which, if I understand things, is as follows:Start with an attractor network with p fixed points (memories). Add adaptation, so that the fixed points become unstable after a while, and the systems hops from one to the other. Then pick L fixed points to be stored in short term memory, and strengthen them. Initialize the network at one of those L fixed points, and let it wander from one fixed point to another. When the network wanders outside the sort term memory -- to one of the other p-L fixed point -- short term memory ends.I will admit that I'm guessing on most of that. I didn't read the whole paper, but none of that was stated in the parts I did read (intro, the beginning of results and the beginning of methods). The only hints I had were on lines 11-23, which is the only description I could find of latching dynamics, "... recall exploits the natural tendency of the cortex to hop from state to state -- latching dynamics", and on lines 55-56, where you say "We show that by adding a mechanism that gives an extra "kick" to a small subset of L among p patterns in long-term memory, ...", which I take to mean that some memories were strengthened.It needs to be clear to the reader, from the beginning, what's going on -- stated simply, so that one doesn't have to work through the math, or rely on priors, to figure it out. I personally would drop the phrase "latching dynamics" altogether, since it doesn't have any intrinsic meaning to me (and it conjures up all sorts of phenomena that have nothing to do with memory). But if you do use it, be sure to define it carefully, and prominently.2. The second most serious problem is that the equations are out of order. It was not possible for me to make sense of Eqs. 1, 2 or 3, since I didn't know the underlying network equations. I had to go to Methods to make sense of them. Since they're needed to understand the paper, the network equations should go first, in the main text.3. Several comments on the network equations (now in methods):a. Please tell us what you use for xi_i^mu.b. I didn't understand the definition of a. Is it a function of the xi_i^mu?c. Why are you using the Potts model rather than a Hopfield network?d. Presumably the theta's are what causes the network to adapt. You should provide intuition as to why.e. What's the significance of the superscript 0?4. I did not understand lines 124-7.5. Line 132: what's special about 0.3?6. Paragraph starting on line 137: the expressions for P_L should be derived. I may have been able to work them out myself, but I was pretty lost by this point, and it didn't help that I didn't know what a was.7. Paragraph starting on line 146: you talk about sqrt{L} scaling, but isn't that ruled out by Fig. 1c, which shows that the number of latches is independent of L?8. Fig. 2: what's gamma_A? It was probably defined somewhere, but it was not easily accessible.9. Line 226: what's a "consecutive latch"?At this point I was pretty lost, and stopped reading. Again, these are only to provide a flavor of the things that confused me. You should be able to make the paper easily understandable by a math-competent reader.Peter Latham, Associate editor.--formal letter followsThank you very much for submitting your manuscript "Latching dynamics as a basis for short-term recall" for consideration at PLOS Computational Biology.As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.When you are ready to resubmit, please upload the following:[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).Important additional instructions are given below your reviewer comments.Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.Sincerely,Peter E. LathamAssociate EditorPLOS Computational BiologyLyle GrahamDeputy EditorPLOS Computational Biology***********************Figure Files:While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at .Data Requirements:Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.Reproducibility:To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see12 Apr 2021Submitted filename: coverLetter_PCOMPBIOL-S-21-00280.pdfClick here for additional data file.14 Jun 2021Dear Mr. Ryom,Although this a major revision, overall the reviewers were positive, and it doesn't look like it will be _too_ hard to make the changes. Says the person who doesn't have to make them. ;)Peter--formal letter follows.Thank you very much for submitting your manuscript "Latching dynamics as a basis for short-term recall" for consideration at PLOS Computational Biology.As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.When you are ready to resubmit, please upload the following:[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).Important additional instructions are given below your reviewer comments.Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.Sincerely,Peter E. LathamAssociate EditorPLOS Computational BiologyLyle GrahamDeputy EditorPLOS Computational Biology***********************Reviewer's Responses to QuestionsComments to the Authors:Please note here if the review is uploaded as an attachment.Reviewer #1: Authors discuss retrieval capacity limitations using different measures and in different variation of Potts model for memory storage. In general, our knowledge of memory recall mechanisms in humans is very limited. Therefore, theoretical models, especially supported by experiments may help us constraint possible memory mechanisms. Current work presents some of the results in vague form that would better be sharpened before publication. For example, in lines 339-349 it appears that authors implicitly assume that the difference between long-term memory and short-term memory is in the way one measure performance, and not a separate mechanism. But this is so vague stated and buried in the rest of the text that it is not clear whether authors make this statement or not. Authors are trying to advance the idea that capacity of recall is limited by items interference. Nevertheless, authors ignore the model that give estimated value of capacity close to one observed in experiments with humans. In contrast, they choose the model that has much larger expected capacity. The arguments to choose one specific model as the model for short-term memory are not convincing. Nevertheless, results of experiments are roughly consistent with simulations. Therefore, it is possible that the model describes some aspects of recall. In my opinion the manuscript should be revised to make points made clear.Minor points.Lns 112., 115. The same letter theta used for different variables which is confusing.Ln. 129 “step towards plausibility” probably means towards realistic networkLns 132-145. The statements may be more precise. (ln. 133 “perfectly retrieved”, ln. 136 “after the retrieval” what means retrieval? Perfect, above threshold on overlap?). (ln. 137 “Above phase transition” - is there a single parameter defining phase space?) (ln. 143. “but otherwise free” - ???)Ln 214-215. “latching dynamics are effectively constrained to the L items, but only up to a given value of L” – meaning is not clearLn 241. “**ita** regime of operation, such that **the its** ability to spontaneously recall”Section 3. It is not clear why one would like to have more latches that the number of stored memories. It is probably more important how many unique memories are recalled (latched).Lns 339-349. The meaning of these two paragraphs are not clear. Do authors suggest that STS is not a brain mechanism but an artifact of quantification method? There are some modeling works in psychology that does not require STS per se to describe free and serial recall (e.g. doi:10.1037/0033-295X.114.3.539). But the rest of text is inconsistent with this statement. So, I guess, authors can make their statement more precise here.Ln 368-369 “In this way we measure the memory capacity for serial recall by computing the Area Under the Curve (Fig. 6b).” – Could authors be more precise what exactly was done and why AUC appears here?Ln 387 “such sequences as Potts-compatible” – that may be too strong statement, since one may be able to implement in the future Potts model that can recall arbitrary sequence. Reserving this strong term then would lead to confusion. This is not critical, for current presentation though.Ln 409-410 “These instructions are congruous, as they reproduce latching sequences emerging without any heteroassociative instructions” – it is not clear what is mathematically meant by instructions.Ln 415 “can serve short-term memory (e.g., free recall)” – meaning is not clearFig. 7 Blue dotted curve (a) 0.6+0.4+0.1 << UAC 3 in (c) “(c) Area Under the Curve (AUC) computed from the curves of (a), intended as a measure of overall performance.”Ln 431. Cad is referred (Fig S10) before it is defined.Ln 433 Cas is referred (Fig S11) before it is defined.Reviewer #2: The authors present a model of free and serial recall, which embodies the idea of short-term memory as the result of “priming” a sub-set of items in the long-term memory store. The long-term memory store is implemented by using a Potts model, and three different mechanisms for priming items are introduced and studied. The model is able to reproduce experimental observations, notably the “square root law”, and provide interesting insights into possible mechanism(s) responsible for the limited capacity of short-term memory.Recent work by Tsodyks and collaborators has shown that a random walk in “memory space” can account for subjects’ short-term memory performances to an amazingly quantitative detail. This work follows up on that thread and provides a more “microscopic” interpretation of the memory random walk. Importantly, in my opinion, it shows that the hypothesized dynamics can indeed occur in a distributed memory system; and strongly suggests that capacity limits are ultimately due to the distributed nature of the long-term store.The results reported appear correct, as far as I can check. The paper is very well written and the figures are well chosen and informative. I have no comments.Reviewer #3: The paper by Il Ryom et al is a great contribution for understanding the memory of sequences and short term memory of multiple objects. The authors employ their previously developed Potts network model of cortical memory processing and consider modifications of it (e.g. adaptation, synaptic plasticity etc) that allow a study of how short term memory of multiple objects as well as sequences can be formed from the stored long term memory patterns. In addition to this, they offer experimental data that allow understanding how each of these different modifications contribute to the errors (or regularities in the errors) that appear in the data. I think this work is important, as theoretical models for memory of sequences and short term memory, the capacity of such memories, and type of errors that they lead to have been paid much less attention to. This is particularly the case when compared to working memory models of e.g. objects, or models of long term memory. When it comes to sequences processing models that have been studied, more or less all rely on the the standard way of storing and retrieving sequences follow the seminal work of Sompolinksy & Kanter PRL 1986 where hetro-associative connectivities are employed, without much studying how the presence of long term stored memories influence errors in retrieval. The current paper takes this further by considering how other mechanisms such as adaptation and intrinsic latching (the ability of the network to generate quasi-random sequences of long term stored patterns) interact with this heteroassociative connectivity. The paper also provides a mechanistic model of the sqrt(L) law in free recall which is also quite novel. So, in principle, the paper is a suitable contribution to PLoS Comp Biol, though I think before publication, as I describe below, a number of changes should be done and a number of issues explained.1. The first point that I think needs to be addressed by the author is how they pitch their work. The authors start by discussing their aim of explaining the low capacity of short term memory. For this, they cite a number of experimental results. The problem that I found confusing was that short term memory is one of those words that mean different things to different people. In the introduction and throughout the paper, the authors mention a range of experimental results from change detection experiments in visual working memory tasks to free recall tasks, some making quantitative observations, some qualitative, although all point to the direction of limited capacity (however defined in that particular task) of memory in those tasks. The model the authors describe, however, as far as I understand, is mainly about a system that generates a sequence of a certain number of objects in a given order, or quasi-randomly (e.g. in a free recall task). How, for instance, this relate to change detection memory tasks is not immediately clear.I think the paper would benefit from a more focused introduction, and perhaps a section reviewing what in different and specific experimental setups people define as short term memory, what they use to measure capacity, and how the model described here relate to each. This I believe would be extremely useful for the audience of PLoS Comp Biol who may not all know all the different ways people define short term memory in experiments and how these different ways relate to each other.2. In estimating L_c, the authors use the criterion P_L= 1-1/e combined with Eq. 14-17. where does this criterion come from?3. In sec 4.2, the authors define g(L) as the given number of consecutive latches and then set this to 4 log-2(L)-2 to "establish a reasonable comparison with the results in [12]". Can you elaborate where this comes from?4. Section 6. I think it would be useful if the authors explicitly give the expression for the mutual information between patterns. This can be defined in a number of ways, and it is only from the context that one can infer what the author mean by it. same section, z is defied as the relative separation. Relative to what?5. Again in section 6, it seems to me that the authors make a number of guesses that are not clearly stated.For instance in line 442, they state where the second order peak would be located but don't say how they come to this conclusion, or how what zeta is twice the z-value of the first peak.I also think it is better to refer to the peaks as first and second peaks instead of first order and second order peaks.6. Last sentence of section 6 "This is consistent ...". I think the damped oscillation results in this section are quite interesting and it would be appreciated if the authors elaborate on the consistency with the results of sec 13 and/or discuss other possible ways experiments can support or have supported this phenomena.7. Regarding the damped oscillatory behaviour of sec 6: could this be simply understood by the fact that there is a competition between adaptation (making a more correlated patterns less likely to follow) and correlations (making a more correlated pattern to be follow)? If so, is it not possible to write down an effective dynamics that describes the phenomena based on these two?throughout the paper the authors repeadetly mention the papers from Misha Tsodyks' group (refs 12 and 25). I think it would be useful if the authors give a name to this model, instead of using they and their together with the refs.line 87: which model patchers of cortex--> each of which representing the state of a single patch of the cortexline 90: one quiet state: perhaps it should be said that this ia state where no pattern is retrieved, e.g. a background firing state, not necessarily a silent or no-firing state.line 106: current--> inputline 111: effective inverse temperature is a physics jargon. I would say that beta measure the level of noise in the system.line 141: revise the sentence. I cannot parse it.line 152-155: long and difficult sentence to parse. it would perhaps be easier to say "are strengthened by increasing the value of some pre-existing parameters e.g. .... This increase effectively brings ...". Also I am not sure being items across a "network phase transition" means.line 241: its--> itslines 491-494: "We have ..." long sentence, hard to parse and I think incomplete.line 459: wouldn't it better to use discrete instead of digital? digital is often taken to be equivalent to binary though it is a wrong take.line 504: "paradigms that involve multiple subset". Please elaborate what this means.line 503: what do you mean by "the time course of the "boost"."**********Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #1: NoneReviewer #2: YesReviewer #3: Yes**********PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: NoReviewer #2: NoReviewer #3: Yes: Yasser RoudiFigure Files:While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at .Data Requirements:Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.Reproducibility:To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols21 Jul 2021Submitted filename: answer_to_reviewers.pdfClick here for additional data file.3 Sep 2021Dear Mr. Ryom,There are actually some minor revisions, but they're pretty easy, so it's not worth going another round. Congratulations on a very nice paper!Peter---formal letter followsWe are pleased to inform you that your manuscript 'Latching dynamics as a basis for short-term recall' has been provisionally accepted for publication in PLOS Computational Biology.Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology.Best regards,Peter E. LathamAssociate EditorPLOS Computational BiologyLyle GrahamDeputy EditorPLOS Computational Biology***********************************************************Reviewer's Responses to QuestionsComments to the Authors:Please note here if the review is uploaded as an attachment.Reviewer #1: Authors carefully addressed all points raised before. The exposition is substantially improved.Minor points.There are few typos, e.g. ln 740. .... log_2^L ....Reviewer #3: I am pretty happy with the revised version and the answers to my comments. I only have to minor comments now.1. In their response to my comments 2 and 3, the authors have made clarification in the response letter, but have not adjusted the text of the paper. I think they should also add them in the text.2. My comment re line 141 of the previous version. Try:"As the network hops from memory to memory, it can simulate free recall. This happens if latches areconcentrated onto STM items, but otherwise free, i.e., not coerced by external agents."**********Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #1: NoneReviewer #3: None**********PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: NoReviewer #3: No10 Sep 2021PCOMPBIOL-D-21-00218R2Latching dynamics as a basis for short-term recallDear Dr Ryom,I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!With kind regards,Andrea SzaboPLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol
Authors: Franklin M Zaromb; Marc W Howard; Emily D Dolan; Yevgeniy B Sirotin; Michele Tully; Arthur Wingfield; Michael J Kahana Journal: J Exp Psychol Learn Mem Cogn Date: 2006-07 Impact factor: 3.051