Literature DB >> 35976980

Understanding the structure of cognitive noise.

Jian-Qiao Zhu¹, Pablo León-Villagrá¹, Nick Chater², Adam N Sanborn¹.

Abstract

Human cognition is fundamentally noisy. While routinely regarded as a nuisance in experimental investigation, the few studies investigating properties of cognitive noise have found surprising structure. A first line of research has shown that inter-response-time distributions are heavy-tailed. That is, response times between subsequent trials usually change only a small amount, but with occasional large changes. A second, separate, line of research has found that participants' estimates and response times both exhibit long-range autocorrelations (i.e., 1/f noise). Thus, each judgment and response time not only depends on its immediate predecessor but also on many previous responses. These two lines of research use different tasks and have distinct theoretical explanations: models that account for heavy-tailed response times do not predict 1/f autocorrelations and vice versa. Here, we find that 1/f noise and heavy-tailed response distributions co-occur in both types of tasks. We also show that a statistical sampling algorithm, developed to deal with patchy environments, generates both heavy-tailed distributions and 1/f noise, suggesting that cognitive noise may be a functional adaptation to dealing with a complex world.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35976980 PMCID： PMC9423631 DOI： 10.1371/journal.pcbi.1010312

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.779

Introduction

Human cognition is fundamentally noisy across all kinds of judgments and behaviors [1-3]. In empirical research, noise (often treated as residuals in experimental inquiry) is generally assumed to be a random fluctuation independent of the underlying signal and previous trials, hence a nuisance variable, which is removed by averaging results or counterbalancing experimental designs. Therefore, these noisy residual fluctuations were typically assumed to play no functional role in cognitive tasks, and so in models of cognition they are very often characterized as independent draws from a Gaussian distribution. However, studies investigating the properties of noise in human cognition have instead found interesting structure [3-6]. First, while continuous responses are usually assumed to be normally distributed, it has been found that, in free recall tasks, inter-response intervals (IRIs) follow heavy-tailed distributions. Participants asked to recall animal names (see Fig 1A), mostly produced short intervals between retrievals of animal names, but infrequently their retrieval intervals were much longer. These heavy-tailed distributions of retrieval times, l, were well described as power laws: P(l) ∼ l− with many participants exhibiting tail exponents of μ ≈ 2 [4]. Interestingly, research on animal foraging suggests that the mobility patterns of a wide array of species also exhibit the same exponents, which are mathematically optimal for blind search in environments in which resources are clumped together [7]. Together, these results suggest that human memory retrieval amounts to foraging in a patchy psychological space [6].

Fig 1

Experiments and results.

Experiments and results.

(A) Animal naming experiment. (B) Time estimation experiment. Within each row from left to right, there are illustrations of the experimental procedure, quantile-quantile plots of successive changes in IRI or time estimates to test for heavy tails (95% CI shaded around means and the horizontal dashed lines denoting the Gaussian distribution), and power spectra to test for 1/f noise (fitted at frequencies less than 0.1, 95% CI shaded around means). Second, when participants make repeated estimates of a temporal duration or a spatial magnitude, their responses are not independent, nor only dependent on the last estimate as would be predicted by a random-walk model. Instead, they also depend on long-ago estimates [3]. Similar long-range autocorrelations also occur in the response times of many other cognitive tasks such as mental rotation [5]. These correlations are best described as 1/f noise and can be expressed in the frequency domain, S(f) ∼ 1/f, where f is frequency, S(f) is spectral power, and exponents α ∈ [0.5, 1.5] are considered 1/f scaling [3, 5]. 1/f noise has been explained as the result of complex organisms displaying a self-organized criticality, meaning the organisms are inherently and continuously transitioning between different stable states and as a consequence exhibit complex behavior [8]. It also appears to be a signature of active cognitive processing, because it disappears if participants are not asked to do anything more than simply press a button in response to a randomly-occurring target [3]. It is very rare to find a cognitive model that predicts either heavy-tailed distributions in trial-by-trial changes or 1/f noise. And, because these two effects have been largely studied in isolation, as a consequence, even the small number of models that predict 1/f noise do not predict heavy tails and vice-versa. It is not trivial to produce 1/f noise and the most common descriptive model, fractional Brownian motion, predicts a Gaussian distribution of successive changes instead of a heavy-tailed distribution [9]. Conversely, the most common model of heavy tails in successive changes, the Lévy flight, is a random-walk model that does not produce long-range autocorrelations [7]. In sum, standard models suggest that heavy-tailed distribution by themselves do not simply imply 1/f noise and vice-versa. Here, we investigate whether 1/f noise and heavy-tailed distributions of trial-by-trial changes co-occur in the same experimental task. We ran two experiments with very different tasks: an animal naming task, previously used to show heavy tails [4], and a time estimation task, previously used to show 1/f noise [3]. In the animal naming experiment, participants were instructed to type animal names as they came to mind, with the only constraint being that successive names needed to be different. In the time estimation experiment, participants were first presented a demonstration of a target time interval (1/3, 1, or 3 seconds) and then were asked to repeatedly reproduce the interval, as if they were drumming (see Fig 1 and Materials and Methods for details). To determine whether trial-by-trial changes followed a heavy-tailed distribution, we fit μ following a similar procedure to [10], and also used stricter tests based on directly comparing heavy and exponential tails [11] that show qualitatively very similar results (see Text A in S1 Supporting information). To measure the autocorrelation exponent α, we fit a line to the windowed, log-binned power spectrum for low frequencies (i.e., less than 0.1) following [3].

Results

In the animal naming experiment, both the pooled data (see Fig 1A) and all individual exponents indicate heavy-tailed distributions ( in the range of [1.29, 2.61]), which lies within the (1, 3] range indicating heavy tails, replicating [4]. Congruent with the idea in memory foraging, participants were more likely to report successive animal names that belonged to the same category (e.g., patch) than other categories. In addition, IRIs were longer when transitioning between categories, and IRIs were correlated with distance in a semantic embedding (see Text A in S1 Supporting information). Finally and critically, these data also show 1/f noise, both for the pooled results () and for 9 of 10 individual participants ( in the range of [0.38, 0.90]). Our time estimation experiment also found 1/f noise, replicating [3]. For the pooled data, was 1.10, 1.45, 1.27 for the 1/3s, 1s, and 3s conditions, respectively. 21 of 30 participants had exponents in the 1/f range ( in the range of [0.53, 2.06]). Importantly, the time estimation experiment also showed heavy-tailed trial-by-trial changes in the pooled data (see Fig 1B), and in 23 of 30 participants ( in the range of [0.87, 2.67]). We present evidence that this finding was not due to lapses of attention in Text A in S1 Supporting information. Overall, in both experiments, heavy-tailed trial-by-trial changes and 1/f noise co-occurred, and many individuals exhibited convincing evidence of both effects: 9 of 10 participants in the animal naming experiment and 17 of 30 participants in the time estimation experiment. The co-occurrence of heavy tails and 1/f noise invalidates the most common accounts for each and calls for another explanation. One possible direction is to describe noise using a more complex statistical process, such as Brownian motion in multifractal time, that can, for some parameter settings, produce both features of cognitive noise [12]. However, this account is incomplete as it does not explain how people are able to perform the time estimation task—it is silent on why participants’ average estimates tracked the target times (M = 0.36s for 1/3s, M = 1.19s for 1s, and M = 3.50s for 3s). That is, descriptive models of noise offer a very incomplete account of human behavior. An interesting alternative that does explain performance casts the mind as an intuitive statistician: the brain creates probabilistic models of an uncertain world and acts according to the prescriptions of these models when taking action [13, 14]. These models have successfully explained many aspects of effortful cognitive tasks, which, like our animal naming task, require memory retrieval [15], as well as automatic-seeming perceptual tasks such as time estimation [16, 17]. However, they have been criticized on the grounds that they are far too complex to be psychologically plausible, and on the grounds that people show systematic deviations from Bayesian statistical models. Statisticians address the intractability of probabilistic models by using approximations such as sampling algorithms, and it could be that the brain does the same: it utilizes sampling algorithms similar to those used in statistics to approximate otherwise intractable solutions. Sampling approximations are appealing as they show many of the same deviations from exact probabilistic inference that people do [18], and provide an explanation for behavioral and neural variability [19, 20]. There are many types of sampling algorithms and the simplest, drawing independent samples, requires knowing the probability of every hypothesis. This itself is often computationally intractable, leading to the development of a sophisticated family of sampling algorithms called Markov Chain Monte Carlo (MCMC) [21]. MCMC algorithms traverse the space of hypotheses using only local information about the probability distribution. These local transitions mean that successive samples are unavoidably autocorrelated, a necessary evil as these samples very often convey less information than the same number of independent samples do. The ability to operate with only local information about a probability distribution is the strength of MCMC algorithms, but it introduces weaknesses. Multi-modal probability distributions, those in which there are multiple clusters of high-probability hypotheses which are separated by regions of low-probability hypotheses, are a challenge to these algorithms. Multimodal probability distributions formalize the idea of patchy mental representations, such as those that researchers assume participants use in animal naming experiments [4]. Having only local knowledge, MCMC algorithms are often stuck in one patch and require a large number of iterations before they can visit other isolated modes. As illustrated in Fig 2, this is a problem both for basic MCMC algorithms such as Random Walk Metropolis (RWM), and more advanced versions such as Hamiltonian Monte Carlo (HMC). To address these weaknesses, statisticians have developed MCMC algorithms that deal better with multimodal representations. One of the first, Metropolis-coupled MCMC (MC3), runs multiple MCMC chains in parallel: while one chain produces the samples, the remaining chains explore the space for isolated modes [22]. When an exploring chain finds an isolated mode, the sampling chain is likely to swap positions with the exploring chain, and then start sampling from that isolated mode (see Fig 2 and descriptions in Text B in S1 Supporting information). The interaction between chains is crucial for MC3 to switch locations between modes (assuming an exploratory chain has reached another mode). However, there are foraging rules that explore patchy environments with only knowledge about the mode that the agent is situated in (see [23, 24] for detail). For these foraging rules, switching to a new mode often requires random exploration, as is also the case for both RWM and HMC. Nonetheless, there are potential overlaps between foraging rules and sampling algorithms as any foraging rule that visits locations according to their probabilities is a sampling algorithm. Therefore, foraging rules with memory that can produce autocorrelations are interesting candidates for future investigation.

Fig 2

Example trajectories for the three samplers for a mixture of multivariate Gaussians.

(A) Random Walk Metropolis (RWM), (B) Hamiltonian Monte Carlo (HMC), and (C) Metropolis-coupled MCMC (MC3).

Example trajectories for the three samplers for a mixture of multivariate Gaussians.

(A) Random Walk Metropolis (RWM), (B) Hamiltonian Monte Carlo (HMC), and (C) Metropolis-coupled MCMC (MC3). Past work has not compared how well these different MCMC algorithms match human behavior, most likely because they are difficult to distinguish in most experimental data, and time-series data is needed to do so. Neither 1/f noise nor heavy-tailed changes naturally arise from any of these algorithms, and in particular, 1/f noise will be the result of a mixture of processes with specific scales, and will show a power-law relationship only within a range. There is no reason to expect a priori that the samplers will emit both heavy-tailed changes and 1/f noise. We quantitatively investigated RWM, HMC, and MC3 to see which, if any, would produce heavy-tailed distributions and 1/f noise. The sampling algorithms were assumed to operate in a hypothesis space, which in the time estimation task was the space of possible time estimates and in the animal naming task was a semantic space of possible animal names. From inspection, the distributions of responses in the time estimation task were unimodal, and so we used a Gaussian distribution as the target distribution, P(H), for all three sampling algorithms in this task, and time estimates were assumed to be direct readouts from the samples of hypotheses, h ∼ P(H). Animal name IRIs, however, are more complex to model as they require more assumptions about both the target distribution and about how the samples relate to IRIs. The distribution we used (shown in Fig 2) was obtained by fitting a Gaussian mixture model to the animal names arranged in a two-dimensional abstract semantic space in which the sampled points could lie between animal names (see Text C in S1 Supporting information for details). To relate samples to IRIs, we assumed that samples were generated at a constant rate but at random times (i.e., following a Poisson process), and that the animal name “in mind” was the nearest animal name to the current position of the sampler in the semantic space. In line with our experimental instructions that asked participants to report an animal name as soon as the animal name they had in mind changed, IRIs were on average proportional to the number of samples that were generated before the nearest animal name to the sampler’s position changed. Further details about these assumptions and an exploration of an alternative in which IRIs are related to the distance between samples are given in Text C and Table A in S1 Supporting information. To assess which sampling algorithm best describes the co-occurrence of heavy tails and autocorrelation observed in human data, we performed a likelihood-free model comparison known as Approximate Bayesian Computation (ABC) [25]. ABC is an approximation to the gold standard measure of marginal likelihood that is particularly suitable for our data because (i) the presence of autocorrelations in time series is notoriously difficult to be properly controlled for with traditional methods, (ii) simulating a sampler’s trace is relatively easy, and (iii) the simulated and observed and are compared directly. The detailed ABC procedure is given in Text C in S1 Supporting information. Table 1 summarises the model comparison results for the three sampling algorithms. We find that RWM and MC3 perform similarly overall on the animal naming task with a small advantage for RWM: from the marginal likelihoods RWM was 2.2 times more likely than MC3. Both RWM and MC3 decisively outperformed HMC: both were more than 300,000 times more likely than HMC. As discussed in Text C in S1 Supporting information, HMC performed better when assuming that IRIs were related to the distance the sampler travelled in the semantic space, but the versions of RWM and MC3 presented here still were more than 25 times more likely than the best-performing version of HMC. This, of course, means that conclusions about the sampling algorithms cannot be drawn without also specifying how the samples relate to behavior. RWM is the best fitting model for the largest proportion of participants, but the protected exceedance probability (i.e., the probability that a model describes the greatest number of participants [26]) was unconvincing. In the time estimation experiment, MC3 decisively provides a better account of participant data than either HMC or RWM: it was more than 2 × 1047 more likely than either alternative algorithm. MC3 also convincingly fit the largest number of participants, as measured both by raw counts and by protected exceedance probabilities.

Table 1

Model scores for the three sampling algorithms: Random Walk Metropolis (RWM), Hamiltonian Monte Carlo (HMC), and Metropolis-coupled Markov Chain Monte Carlo (MC3).

Experiment	Model Scores	RWM	HMC	MC³
Animal Naming	log marginal likelihood	-16.0844	-29.5261	-16.8500
	number of best-fitting participants	7	0	3
	protected exceedance probability	0.59	0.16	0.25
Time Estimation	log marginal likelihood	-303.7798	-301.9607	-193.0191
	number of best-fitting participants	0	2	28
	protected exceedance probability	0.00	0.00	1.00

Materials and methods

Ethic statement

Ethical approval for the experiments was given by the Department of Psychology Research Ethics Committee at the University of Warwick. Written informed consent has been obtained from the participants.

Animal naming experiment

Participants

Ten native English speakers (6 female and 4 male, aged between 19–25 years) were recruited from the SONA subject pool of the University of Warwick.

Procedure

Participants were asked to type animal names as they came to mind and were explicitly instructed that they could resubmit previous animals, though not consecutively. Once participants had typed an animal they submitted their choice by pressing the enter key. The inter-response interval (IRI) between two consecutive animal names was the duration between last enter pressed and the very next key response. The experiment lasted about 60 minutes or until the participants submitted 1024 animals. Participants received £6 for participating.

Analyses

IRIs were calculated per-participant for successive submissions. On average it took participants 5.30 seconds (SD = 11.03) to submit an animal name. We fitted autocorrelation exponents to the IRI sequences using windowed, log-binning fits [3]. Per-participant correlation exponents (α) fits ranged from 0.38 to 0.90, (M = 0.68, SD = 0.12), and 9 in 10 exponents were in the 1/f range. We also obtained per-participant tail exponents (μ) ranging from 1.29 to 2.61 (M = 1.80, SD = 0.49) (see Fig 3 for details).

Fig 3

Individual data for the animal naming experiment.

(A) Power spectral density. (B) Histograms of inter-response intervals. Fitted lines and curves are overlaid with exponents.

Individual data for the animal naming experiment.

(A) Power spectral density. (B) Histograms of inter-response intervals. Fitted lines and curves are overlaid with exponents.

Time estimation experiment

Another 37 participants (13 male, 23 female, 1 undisclosed gender, aged between 18 and 41) were recruited through the SONA subject pool of University of Warwick. Participants first listened to a sample of the target temporal interval for 60 seconds, presented as computer-generated beeps. Following this familiarization period, participants were instructed to reproduce the beeps (effectively to estimate the target time interval) by pressing the spacebar when they believed the target interval had elapsed. The next trial began as soon as the last one ended, making the task similar to ‘drumming’ at the rate of the target interval. The experiment was terminated when participants produced 1030 keystrokes or the maximum experimental duration was reached. Participants were paid relative to the maximum duration of experiment, which varied across the three conditions (6, 20, and 60 mins), receiving £2, £4, and £6 respectively. Of the initial sample, 30 participants completed over 512 time estimates, and their data was analysed in the main text. There were 10 participants for each of three target time interval conditions (1/3s, 1s, and 3s). To exclude possible resting periods, we only analysed the time estimates that were less than 3 times the target time interval. Further analyses to exclude possible resting periods are presented in the Text A in S1 Supporting information. Per-participant correlation exponents (α) ranged from 0.53 to 2.06 (M = 1.27, SD = 0.44) and tail exponents (μ) ranged from 0.87 to 2.67 (M = 1.60, SD = 0.59) (see Fig 4 for details).

Fig 4

Individual data for the time estimation experiment.

(A) Power spectral density. (B) Histograms of absolute changes in time estimates. Fitted lines and curves are overlaid with exponents.

Individual data for the time estimation experiment.

(A) Power spectral density. (B) Histograms of absolute changes in time estimates. Fitted lines and curves are overlaid with exponents.

Discussion and conclusions

Very few cognitive models predict either heavy-tailed distributions of trial-by-trial changes or 1/f noise, and none of which we are aware predict them both. However, in replicating two standard cognitive tasks, each of which has been independently used to identify either heavy tails or 1/f noise, we find a strong evidence for the co-occurrence of heavy tails and 1/f noise. We explored the functional role of heavy tails and 1/f noise through the lens of approximations to Bayesian models of cognition. Three sample-based approximations were studied: RWM, HMC, and MC3. In the time estimation task, MC3 better described the human data compared to RWM and HMC in terms of marginal likelihood and number of best-fitting participants. In the animal naming task, MC3 and RWM were comparable on marginal likelihood with the number of best-fitting participants favouring the latter. However, this measure of relative performance does not answer the question of absolute performance: whether MC3 produces the key outcomes of heavy tails and 1/f noise. To answer this critical question, we calculated the modal α and modal μ from the posterior distribution for each participant, and classified each as indicating heavy tails or 1/f noise in the same way we did with the experimental participants. We found a perfect correspondence in animal naming and a strong correspondence in time estimation, with MC3 showing slightly less prevalence of 1/f noise but a greater prevalence of heavy tails than the human participants (see Table 2).

Table 2

Correspondence of 1/f noise and heavy tails between human data and the MC3 posterior predictive distribution.

		Animal Naming Experiment				Time Estimation Experiment
		1/f noise		heavy tails		1/f noise		heavy tails
		yes	no	yes	no	yes	no	yes	no
MC³ posterior predictives	yes	9	0	10	0	18	0	23	7
MC³ posterior predictives	no	0	1	0	0	3	9	0	0

Overall, we find that MC3 provides a good account of the human data in two very different tasks. This is perhaps less surprising in the animal naming task, as the patchy representation is one for which the algorithm was designed, and complex environments can produce complex behavior from even simple algorithms. It is more surprising in the time estimation task. We speculate that people behave like MC3 in this task because it is a generally useful algorithm, and while it is unnecessary to run multiple Markov chains to sample from a unimodal distribution, it can be difficult to identify when a distribution truly is unimodal: there might always be a distant mode that the sampling algorithm has yet to encounter. Our finding that 1/f noise and heavy-tailed trial-by-trial changes co-occur provides more information about the structure of the variability in human cognition, and is useful in distinguishing between different accounts of noise. While other accounts can describe the co-occurrence, the success of a sampling algorithm in doing so while accomplishing task goals raises the possibility that noisy responses are the signature of a rational approximation in action, rather than a systematic problem with the brain’s hardware. Text A: Data Analysis Methods. Text B: Sampling Algorithms. Text C: Model Comparison Methods. Table A: Model scores for the three sampling algorithms in the animal naming experiment using the alternative method for generating IRIs. (PDF) Click here for additional data file. 6 Dec 2021 Dear Dr. Zhu, Thank you very much for submitting your manuscript "Understanding the Structure of Cognitive Noise" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. Reviewers 1 and 3 are relatively positive, but Reviewer 2 recommends rejection. I find Reviewer 2's arguments fairly compelling, but I also think that you might be able to address them. In addition, I agree with Reviewer 1 that 27% (while greater than 4% and 2%) is not particularly impressive in absolute terms. I'm not sure that this result provides such compelling evidence for MC3. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Samuel J. Gershman Deputy Editor PLOS Computational Biology *********************** Reviewer's Responses to Questions Comments to the Authors: Reviewer #1: This paper explores two interesting features of human response-time distributions: (a) their heavy-tailed shape and (b) their autocorrelation over long time periods (1/f noise). Although there have been theories that explain each of these features of RTs independently, the authors show that they can co-occur in two disparate cognitive tasks (animal naming and rhythmic time estimation), which suggests a potential common explanation for (a) and (b). In simulations, the authors show that both of these features can arise from a specific type of MCMC algorithm (MC3), which efficiently transitions between different chains when one chain seems to find a novel mode. Thus, if the human brain is implementing a similar algorithm when performing tasks like animal naming, as some cognitive theories suggest, then this provides a parsimonious explanation for both the heavy-tailed RT distribution and 1/f noise. I enjoyed reading this paper and found the ideas intriguing. Although I believe the work is well-worth being out in the literature, it would be helpful for the authors to fill in some key details in a revision. Perhaps this is partly explained by my relative unfamiliarity with the subject matter, but I suspect many people would benefit if the authors put in more effort to connect the empirical section of the paper with the simulation work. For example, as of now, I don’t even have a sense of exactly how the authors envision that the time estimation task might use an MCMC algorithm. The task involves rhythmically tapping the space bar at the same tempo for a long period of time. This doesn’t strike me as the kind of cognitively taxing prediction/estimation task that MCMC is typically used for. Moreover, although I have some sense of how MCMC could be relevant for the animal naming task, I’m still not quite sure how to map the MCMC algorithm onto the behavior. For example, when an animal pops into a person’s mind, what is the corresponding ‘event’ in MCMC? Presumably subjects would not be reporting literally every sample from that algorithm, so what constitutes a sample that reaches conscious awareness? I also worry a bit about the generalizability of the simulation results. Even for the MC3 algorithm, only 27% of parameterizations yielded the combined distributional features that are characteristic of human responses. Is there good reason to believe that these parameterizations — or the MC3 algorithm itself, for that matter — are psychologically plausible? This may be a difficult question to answer given the lack of research on this topic, but more context for how these parameters relate to psychology would be useful for assessing the plausibility of the theory. Furthermore, why was the MCMC algorithm only run on relatively simple probability distributions? Why not model an environment that resembles the more patchy semantic network for animals or other task-relevant domains? In sum, the theoretical ideas explored in this paper seem quite interesting, but the authors could expand upon the ways in which their simulation models connect to the psychological phenomena of interest and provide more detail about how well the models really capture human behavior. Minor: - I’m not sure what this means: “1/f noise has been explained as the result of complex organisms displaying a self-organized criticality” (p. 4). - In Figure 1 (or another figure), it would be helpful to see some simple histograms of participants’ actual RTs. I don’t have a great sense of how well these distributions are fitted by a power law or exhibit 1/f autocorrelation. Is there any way to directly overlay model predictions on data? Reviewer #2: The authors show that two human tasks exhibit both heavy tailed distributions of trial-to-trial differences as well as 1/f autocorrelations. They posit that the co-occurrence of these two statistical properties is difficult to explain under standard accounts, and thus might reveal something about the cognitive mechanisms underlying the noise. Finally, they argue that Metropolis-coupled MCMC exhibits the same combination of heavy-tailed, 1/f autocorrelated behavior, suggesting that this is what people are doing. I have two major reservations about this paper: (1) I am not convinced that the combination of heavy-tailed distributions and 1/f autocorrelations is particularly exotic, and (2) i am not convinced that the characterization of MC^3 dynamics is sufficiently diagnostic. Together, these two reservations undermine the major contribution of this paper: the suggestion that the correspondence of these dynamics in human behavior and inference algorithms reveals something about human cognitive mechanisms. How rare is the co-occurrence of heavy-tailed difference distributions and 1/f autocorrelations? This is the first time I have heard the claim that the co-occurrence is rare, so I went to the obvious place to check: natural images. I downloaded a random grayscale natural image from google image search, pulled out one row of pixels, and evaluated the heavy-tail-ness of the pairwise pixel luminance difference, as well as the log-log slope of the power spectrum. I got both -- the log-log slope of the power spectrum was -0.6, and the log-log slope of the mass in the tails of the distribution was -1.3. So, the first place I looked that has natural 1/f noise also has heavy tails for the adjacent pairwise distances (according to the authors' criteria). If luminance in natural images has this co-occurrence as well, I am not sure what we have to learn about cognitive mechanisms from this co-occurrence. If the authors want to make any kind of argument about cognitive mechanisms from these characteristics of response time distributions, it seems important to show that heavy tailed differences are rare among the many natural phenomena that exhibit 1/f noise (https://en.wikipedia.org/wiki/Pink_noise#Occurrence), because if DNA sequences, quasar light emissions, meteorological data series, are like luminance in natural images and response times, and also exhibit the conjunction of 1/f noise and heavy tailed distributions, then appealing to a particular cognitive mechanism to explain the co-occurrence sounds a bit far-fetched. My next concern pertains to evaluating whether MC^3 dynamics exhibit these properties. As far as I can tell, the authors fit lines in log-log space to either the power spectrum, or to the tail probabilities of difference distributions, and then adopted certain cutoffs for acceptable slopes. I find the logic of this approach a bit confusing, because one can always fit a line in log-log space. Spectra and tail probabilities will (nearly always) be diminishing, so the slope will be negative, and thus some exponent fitting the tail probabilities or power spectrum will be identified. So even fitting the tail of a normal distribution in this manner will yield a value of mu. The value of mu that will be obtained for non-power-law relationships will depend greatly on the range of values being considered. For instance, in fitting a power law to the tail of a standard normal distribution, I can get an exponent of -2 when considering z-scores of 1 to 2 (when considering z scores from 1 to 5, I get an exponent of -7.5). So a lot of the action in calling these distributions "power law" amounts to details of how the range of the tail is chosen, and what exponent cutoffs are used. Given the somewhat vague criteria for calling some difference distribution heavy-tailed, or some power spectrum 1/f, the pithy summary in Table 1 leaves much to be desired. For instance: - if one tried really hard in specifying a particular posterior distribution and proposal distributions, is it really the case that random walk metropolis and hamiltonian monte carlo would not be able to meet the criteria for alpha and mu? If some specifications regularly work, but some don't, what does that mean? - on the flip side, is meeting the alpha and mu criteria roughly half the time really that compelling of evidence that MC^3 exhibits 1/f dynamics and heavy tailed change distributions? Basically, both the negative, and positive conclusions drawn from Table 1 seem questionable to me, so I'm not sure how much we can conclude based on the differences among algorithms. Altogether, I am left with the impression that (a) the co-occurrence of heavy-tailed distributions and 1/f dynamics is not unique to cognition, so any cognition-specific explanation of the co-occurrence seems overly narrow, and (b) the tests that aimed to evaluate whether particular inference algorithms also exhibited these characteristics seemed not very diagnostic. The net result leaves much to be desired. Moving forward, I would like to see some stronger support for the claim that the conjunction of heavy tails and 1/f dynamics is somehow unique to cognition, and more rigorous evaluations of these properties in candidate algorithms. Reviewer #3: Review of “Understanding the Structure of Cognitive Noise”, submitted to PLOS Computational Biology (PCOMPBIOL-D-21-01984) Reviewed by Peter M. Todd, Indiana University Cognitive Science Program This is a compelling paper that convincingly shows that two types of noise seen in human behavior can systematically co-occur, and demonstrates a cognitive sampling mechanism that can produce that systematicity. The two empirical tasks are appropriate for making these points, along with the comparison between three sampling algorithms to fit the human data. The authors give support for the useful perspective that noise can be “the signature of a rational approximation in action” rather than an indication of cognitive or neural error as others have been arguing. Overall, the paper makes an important contribution on an interesting question with theoretical impact, is well written, and contains sufficient detail to afford replication and further research. It is good to see two quite different tasks—naming animals and estimating durations—being used to show instances of co-occurrence of the two noise patterns under consideration. Then for the possible underlying mechanism as a sampler over a particular task-related distribution, the applicability of a multimodal distribution to the animal naming task is clear, as people produce multiple clusters of related animal names, but how does it apply to the time estimation task? Is there the expectation in the time literature that an individual would have multiple likely time estimate values (multiple modes in the distribution) that they would be sampling from (rather than one mode)? Or is the point to show that in a task that probably has a unimodal outcome distribution, the Metropolis-coupled MCMC algorithm can still (at least sometimes) show both heavy-tailed and 1/f noise patterns? This should be clarified. The Metropolis-coupled MCMC algorithm used here switches between patches based on comparing the current patch with another one that has already been discovered, which could possibly be what is happening via parallel search in someone’s mind, but seems unlikely for individual animals foraging among patches in space. Humans also appear to use some patch-switching rules analogous to those used by foraging animals, in some tasks involving mental search (e.g., a giving-up-time rule—see Wilke et al., 2009). It could be interesting to see how this algorithm could work with such patch-switching rules that only use “local” information about the current patch, not about other available patches. Wilke, A., Todd, P.M., and Hutchinson, J.M.C. (2009). Fishing for the right words: Decision rules for human foraging behavior in external and internal search tasks. Cognitive Science, 33, 497-529. This question also points to a difference between the two types of tasks used here—in the standard animal naming task, there is the aspect of “using up” the resources in each visited mode, since animal names are supposed to not be repeated, and hence there is in some sense an increasing impetus driving the person to switch from the current mode/patch to a new one (or at least, there’s a benefit to switching at some point—though perhaps not in this paper’s particular version of the task, where repetitions *are* allowed), but in the time estimation task, that impetus does not seem to be present—why should a person (or the sampler) move away from a given mode in this case? (And again, as mentioned above, why expect that there *is* more than one mode in the time estimate distribution?) Finally, as another way to think about the likelihood of the co-occurrence of heavy tails and 1/f noise, given a particular set of values that are drawn from a heavy-tailed distribution, how likely is any particular sequential ordering of those values to show long-range autocorrelations like 1/f noise? (For example, what is the proportion of scrambled re-orderings of the data from these two tasks that still show 1/f autocorrelation?) Does this differ if the values are produced by a Lévy distribution versus by a non-blind patchy foraging process like area-restricted search operating on particular kinds of patch-size distributions (e.g., the latter could produce long IRIs/durations that are more regularly spaced across a sequence than the former)? If so, could this distinguish the processes going on in the two paradigms here (memory search and time estimation), possibly indicating why all participants showed both patterns in memory search but about half of participants showed both patterns in time estimation? Detailed comments by page (using manuscript-page numbers): MS Page 5 top: “Conversely, the most common model of heavy tails in successive changes, the Lévy flight, is a random-walk model that does not produce long-range autocorrelations [7].” –this is presumably because the Lévy flights are assumed to apply to blind foraging over patchy resources with no memory, in contrast to commonly-seen area-restricted search strategies that do involve memory (e.g. making small movements to stay foraging locally as long as resources are plentiful) and hence can produce autocorrelation. P5 bottom: “all individual exponents indicate heavy-tailed distributions”—indicate what criterion is used to make this assessment (e.g. some particular range of tail exponent mu?). (I assume the criterion for 1/f noise given at the bottom of p. 3 is what is being used for that classification in the paper.) P5 bottom: “participants were more likely to report sequential animal names that belonged to the same category (e.g., patch) than other categories”—replace “sequential” with “successive” to emphasize that this is about the movement from one animal name to the very next one (not necessarily a longer sequence)? Pp5-6: the heavy-tail exponent estimates (mu) should be given for both tasks here (also so they can be compared with the mu estimates provided in the Supplementary Text on pp. 16-17). P13 top: It is surprising to see that participants could resubmit the same animal name repeatedly, which is not typically allowed in this verbal fluency task (e.g., not in Rhodes & Turvey, who specified “without repetition” but still got around 4 repetitions over 20 minutes). Why was that done here? Perhaps to get around the potential problem of “using up” resources in this task, which differs from the time estimation task? (see comment above) How often *did* participants repeat animal names, and did they do so while still in a particular patch, or later after returning to a specific patch? P18 middle, Algorithm 1: the call to MCMC Step is missing the second argument (of three). P19 middle, Algorithm 2: in the line “for s = 0 : M//2 do” M is undefined. P20 middle, Algorithm 3: Why are there even line numbers? In line (4), I is undefined (should it be 1?). And in line (5), “for l:L do”, shouldn’t this be “for j=1:L do” (l is undefined)? P21 top: “We evaluated power-law exponents and sample autocorrelations for MCMC, MC3, and HMC” – should MCMC be RWM (as on the previous line)? P22 middle: should “MCMC” everywhere it appears in the last paragraph be “RWM”? And why is MMMD used later in the paragraph when just M is used at the beginning—do these refer to two different values? ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: No: The data is available online, but I did not see the code there (though pseudocode is given in the supplementary text) ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at . Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols 1 Apr 2022 Submitted filename: response_letter.pdf Click here for additional data file. 10 May 2022 Dear Dr. Zhu, Thank you very much for submitting your manuscript "Understanding the Structure of Cognitive Noise" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations. Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Samuel J. Gershman Deputy Editor PLOS Computational Biology *********************** A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] Reviewer's Responses to Questions Comments to the Authors: Reviewer #1: I appreciate the authors' thoughtful responses to my comments and the helpful changes they made to the paper. I have no further concerns and think the paper is ready for publication. Reviewer #3: Review of revision of “Understanding the Structure of Cognitive Noise”, submitted to PLOS Computational Biology (PCOMPBIOL-D-21-01984-R1) Reviewed by Peter M. Todd, Indiana University Cognitive Science Program Overall, the authors have done a thorough and (to me) mostly convincing job of responding to the reviewers’ comments. It is good to see their clarification of their aim to show both that heavy tails and 1/f noise co-occur in some cognitive activities, and that standard cognitive models do not produce this co-occurrence, but MC3 does; a brief summary of this aim would be good to include in the text of the paper as well. It was also very helpful to see that long-tailed distributions by themselves do not imply 1/f patterns—that might also be helpful for readers to know and so briefly mention in the paper. Detailed comments by page: P10 top: “In line with our experimental instructions, IRIs was on average proportional to the number of samples that were generated before the nearest animal name to the sampler’s position changed.” – This further explanation of the relation between the sampling process and IRIs is very good to include, but it made me realize that I was unclear on another aspect of the sampling method, namely that samples are drawn from abstract points in the semantic space, *not* (solely) from points corresponding to particular words. This is clear for the time distribution in the first task, but perhaps should be filled in more for the semantic search task, e.g. that somehow the mind is sampling abstract semantic representations and only when these get close enough to the location of an actual word in the space is that word produced. It is also not clear to me in the above sentence what part of the instructions is meant here, and how the instructions would affect the relationship between (human-generated) IRIs and (model-generated) number of samples—clarify. P10 top: “Further details about these assumptions and an exploration of an alternative in which IRIs are related to the distance between samples are given in Appendix A.” – this actually appears at the end of Appendix B, not A. It is good to see this comparison with an alternative cognitive model as a robustness check, however the comparison is somewhat concerning, because it shows that the results are *not* robust to this small change in the model, which makes HMC outperform MC3 and RWM (but not outperform MC3 and RWM in the original model in Table 1). This should be discussed more, to help readers decide what the implications of this result are for the overall generalizability of the results and conclusions of the paper. P11 Discussion: “Across the two tasks, these results indicate that MC3 better described the human data compared to RWM and HMC.” – does this mean averaging/combining across the two tasks? This seems misleading, given that RWM (slightly) outperformed MC3 on animal naming. P12 Table 2: It is not clear to me what the “MC3 no” line in the table represents? It doesn’t seem to correspond to “human data” as mentioned in the table caption, which I expected. Clarify. ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #3: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols References: Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. 14 Jun 2022 Submitted filename: r2_response_letter.pdf Click here for additional data file. 16 Jun 2022 Dear Dr. Zhu, We are pleased to inform you that your manuscript 'Understanding the Structure of Cognitive Noise' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Samuel J. Gershman Deputy Editor PLOS Computational Biology 11 Aug 2022 PCOMPBIOL-D-21-01984R2 Understanding the Structure of Cognitive Noise Dear Dr Zhu, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Olena Szabo PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

16 in total

Understanding the structure of cognitive noise.

Introduction

Experiments and results.

Results

Example trajectories for the three samplers for a mixture of multivariate Gaussians.

Materials and methods

Ethic statement

Animal naming experiment

Participants

Procedure

Analyses

Individual data for the animal naming experiment.

Time estimation experiment

Individual data for the time estimation experiment.

Discussion and conclusions

1. Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference.

2. Foraging success of biological Lévy flights recorded in situ.

3. Probabilistic models of cognition: conceptual foundations.

4. Bayesian model selection for group studies - revisited.

5. A model for recognition memory: REM-retrieving effectively from memory.

6. Generalization of prior information for rapid Bayesian time estimation.

7. Optimal foraging in semantic memory.

8. 1/f noise in human cognition.

9. Neural Variability and Sampling-Based Probabilistic Representations in the Visual Cortex.

10. The Bayesian sampler: Generic Bayesian inference causes incoherence in human probability judgments.