Sze Chai Kwok1, Emiliano Macaluso2. 1. Key Laboratory of Brain Functional Genomics, Ministry of Education, Shanghai Key Laboratory of Brain Functional Genomics, Institute of Cognitive Neuroscience, School of Psychology and Cognitive Science, East China Normal University, Shanghai, China; NYU-ECNU Institute of Brain and Cognitive Science, NYU-Shanghai University, Shanghai, China; Neuroimaging Laboratory, Fondazione Santa Lucia, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Rome, Italy. Electronic address: sze-chai.kwok@st-hughs.oxon.org. 2. Neuroimaging Laboratory, Fondazione Santa Lucia, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Rome, Italy.
Abstract
Recent demonstrations of scale invariance in cognitive domains prompted us to investigate whether a scale-free pattern might exist in retrieving the temporal order of events from episodic memory. We present four experiments using an encoding-retrieval paradigm with naturalistic stimuli (movies or video clips). Our studies show that temporal order judgement retrieval times were negatively correlated with the temporal separation between two events in the movie. This relation held, irrespective of whether temporal distances were on the order of tens of minutes (Exp 1-2) or just a few seconds (Exp 3-4). Using the SIMPLE model, we factored in the retention delays between encoding and retrieval (delays of 24 h, 15 min, 1.5-2.5 s, and 0.5 s for Exp 1-4, respectively) and computed a temporal similarity score for each trial. We found a positive relation between similarity and retrieval times; that is, the more temporally similar two events, the slower the retrieval of their temporal order. Using Bayesian analysis, we confirmed the equivalence of the RT/similarity relation across all experiments, which included a vast range of temporal distances and retention delays. These results provide evidence for scale invariance during the retrieval of temporal order of episodic memories.
Recent demonstrations of scale invariance in cognitive domains prompted us to investigate whether a scale-free pattern might exist in retrieving the temporal order of events from episodic memory. We present four experiments using an encoding-retrieval paradigm with naturalistic stimuli (movies or video clips). Our studies show that temporal order judgement retrieval times were negatively correlated with the temporal separation between two events in the movie. This relation held, irrespective of whether temporal distances were on the order of tens of minutes (Exp 1-2) or just a few seconds (Exp 3-4). Using the SIMPLE model, we factored in the retention delays between encoding and retrieval (delays of 24 h, 15 min, 1.5-2.5 s, and 0.5 s for Exp 1-4, respectively) and computed a temporal similarity score for each trial. We found a positive relation between similarity and retrieval times; that is, the more temporally similar two events, the slower the retrieval of their temporal order. Using Bayesian analysis, we confirmed the equivalence of the RT/similarity relation across all experiments, which included a vast range of temporal distances and retention delays. These results provide evidence for scale invariance during the retrieval of temporal order of episodic memories.
Scaling laws describe the existence of processes or patterns that are repeated across different scales of analysis (Kello et al., 2010). Scientific laws characteristically hold over a range of scales, such as the Gutenberg–Richter law for earthquake magnitude, the structural self-similarity of fractals, and animal foraging patterns. In cognition, examples include Zipf’s law (1949) to model the relationship between occurrence frequency of a word and its frequency rank, and Steven’s law (1957) to characterise the relationship between the magnitude of a stimulus and its perceived intensity.Several lines of evidence have indicated the presence of scale invariance in memory. For example, the shape of serial position effect curves in serial and free recall exhibit scale invariance. As long as the ratio between the interval between items and the interval between study and test is kept constant, the slope of the recency curve remains unchanged (Bjork and Whitten, 1974, Glenberg et al., 1983; also see Chater & Brown, 2008 for examples in other cognitive domains). Another type of scaling law was discovered when free recall of items from semantic memory was compared to animal foraging behaviours (Rhodes & Turvey, 2007). The authors found that the inter-response time intervals of searching for items from memory conform to the Lévy distribution, which is commonly seen in animal foraging (Viswanathan et al., 1999). In another free recall study, Maylor, Chater, and Brown (2001) asked participants to recall what they did (or would do) in the previous (or next) day, week or year, and found that the rate of item recall was unvarying across recall span.According to local distinctiveness models, memories can be located by their position along a timeline, such that recent items occupying “nearer” and more discriminable locations are easier to retrieve than items stored at locations more distant from the current point in time (Neath, Brown, McCormack, Chater, & Freeman, 2006). This is opposed to the more traditional global distinctiveness models, which instead assume that the distinctiveness of items is determined by their distances from all items to be discriminated (Murdock, 1960).Taking the local distinctiveness assumption into consideration, the Scale-Invariant Memory, Perception, and Learning (SIMPLE) model proposed by Brown, Neath, and Chater (2007) states that items in memory are stored in terms of their location on a timeline that extends from the present backwards to the past. Importantly, in the context of encoding-retrieval tasks, the model takes into account not only the temporal distance between events at encoding, but also the time between the encoding and the retrieval of these events (i.e., retention delay). For instance, two different memory traces encoded 5 versus 25 s in the past will be as confusable with each other as two traces encoded 5 versus 25 min in the past (5:25 for both cases). The actual temporal distances are magnified by 60 times, but the scale of similarity between the two events in question is kept constant (temporal ratio = 1/5), akin to other amplification examples shown in memory tasks of varying time scales (Laming, 2010, Morin et al., 2010; cf. also Weberian compression, Shepard, 1987). The local distinctiveness principle in retrieval has been tested in studies employing simple probe items, such as words (Murdock, 1962), as well as studies targeting more complex, real-world situations (da Costa Pinto & Baddeley, 1991). Here we capitalised on this concept and examined the local distinctiveness effect across several experiments covering a vast range of temporal intervals. Critically, this then allowed us to test the hypothesis of scale invariance by comparing the effect of distance/delay on retrieval performance across different datasets.Accordingly, we measured temporal order retrieval performance for events with temporal distances in the range of 0.5–31.7 min (Exp 1 and Exp 2) and 0.60–4.96 s (Exp 3 and Exp 4), and with retention delays of 24 h, 15 min, 1.5–2.5 s, or 0.5 s for Exp 1–Exp 4 respectively (see Table 1). For each trial in each dataset, we combined temporal distance and retention delay into a single measure of “memory trace similarity”. The similarity scores were computed using the SIMPLE model (Eq. (1) in Brown et al. (2007), p. 544; see also detailed procedure in Section 3.1). For each subject, on a trial-by-trial basis, we used the similarity scores as predictors for the observed data (i.e., retrieval times, RT). The resultant RT/similarity slopes were then compared across the different datasets. Under the scale invariance hypothesis, we predicted the existence of a fixed relationship between retrieval performance and similarity across the wide range of temporal distances of events and retention delays. This was formally assessed using both standard AVOVAs and Bayesian statistics.
Table 1
Description of materials for encoding, retrieval tests and participants in each of the experiments. For the retrieval test, the range of temporal distances (TD) are reported in both video frames and in seconds. A second of movie contained 25 frames. The retention delays are reported only in absolute time (i.e., in min or s). (∗) Fifteen of the 29 participants in Exp 1 (the same behavioural data was previously reported in Kwok et al. (2012)) and all of the 17 participants in Exp 3 performed the tasks inside an MRI scanner. SEM is the standard error of the mean.
Exp 1
Exp 2
Exp 3
Exp 4
Participant details
Number of participants
29∗
15
17∗
15
Mean age (SEM)
25.6 (0.8)
23.1 (0.8)
25.8 (0.8)
25.3 (1.0)
Encoding materials
Length of movies (in time)
42 min
42 min
7.72–11.40 s
7.72–11.40 s
Length of movies (in frames)
59,432
59,432
193–285
193–285
Retrieval test
Retention delay
24 h
15 min
1.5–2.5 s, variable
0.5 s
Shortest TD: in frames/in s
821/33
2485/99
15/0.6
15/0.6
Longest TD: in frames/in s
47,678/1907
51,045/2042
124/5.0
124/5.0
No. of trials
100
160
96
96
Material and methods
Participants
A total of seventy-six subjects participated in four experiments. Each of them participated in only one of the experiments (see Table 1). All subjects had normal or corrected-to-normal vision and signed an informed consent statement approved by the Santa Lucia Foundation (Scientific Institute for Research Hospitalization and Health Care) Independent Ethnics Committee, in accordance with the Declaration of Helsinki.
Paradigm overview
All four experiments were memory studies making use of cinematic materials so as to allow us to model human memories of relatively more naturalistic episodic information (see Fig. 1). This highlights the distinction between natural vision (e.g., movies, see Furman et al., 2012, Haxby et al., 2011) and conventional visual memory studies which often use simpler stimuli (e.g., lists of words). The memory traces created during movie watching should resemble our episodic experience in daily life more closely than lists of unrelated words. In Exp 1 and 2, participants watched one relatively long movie during encoding (duration: 42 min). After either a long (24 h, Exp 1) or a short (15 min, Exp 2) retention delay, they were asked to judge the temporal order of pairs of images extracted from the movie. Because of the length of the movie, the temporal distances between the two images presented at retrieval ranged between 0.5 min and 34.0 min in Exp 1 and 2. In contrast, in Exp 3 and 4, we used a number of shorter movie clips (duration: 7.72–11.4 s) and interleaved the encoding and the retrieval phases with very short retention delays ranging between 1.5–2.5 s in Exp 3 and 0.5 s in Exp 4. The range of temporal distances tested at retrieval for Exp 3 and 4 were 0.60−4.96 s.
Fig. 1
An example of a trial and schema of how temporal similarity was derived. All four experiments used cinematic materials as stimuli, and each trial consisted of two phases: encoding and retrieval. Participants either watched one long 42-min movie (Exp 1–2) or short 7–11 s video clips (Exp 3–4), and following a retention delay, made discriminative judgements between a pair of images extracted from the movie/clip. The temporal similarity between two events (i and j) is computed by taking into account the temporal distance between them (TD) and the exact retention delay between their occurrence and the moment of retrieval. For mathematical details, please refer to Eqs. (1), (2) in Section 3.1. Computation of the similarity scores.
Separately for each experiment, we first asked whether the temporal distance between two events at encoding would affect subsequent retrieval performance (RT and accuracy). We predicted that the shorter the temporal distance, the more difficult/slower the order judgment would be, and that this would occur across the wide range of temporal distances and retention delays (Exp 1–4, see Table 1). Next, we turned to our main hypothesis about scale invariance for the retrieval of temporal order. We computed similarity scores that take into account both the distance between the memory probes and the duration of the retention delays, and compared the relationship between RT and similarity across the four datasets. We predicted that this relationship would be the same irrespective of temporal distances and delays.
Materials and experimental tasks
In all experiments, each retrieval trial comprised a pair of still frames extracted from the movies. Participants were instructed to identify which one of the pair happened earlier in the movie (two alternative forced-choice) and to respond as quickly and accurately as possible once the images appeared (i.e., “Which image was presented first in the movie clip?”). They responded on a two-button response box with their right index or middle finger. Stimuli in all experiments were presented with Cogent 2000 toolbox (http://www.vislab.ucl.ac.uk/cogent.php) implemented in Matlab 7.4 (The MathWorks, Natick, MA). In all experiments, the pairs of images were presented side by side. The left-right position of the target image was counter-balanced across trials. The presentation order of retrieval trials was randomised using the Matlab function “randperm” for each participant to minimise the effect of a given pair of frames influencing the retrieval performance on immediately subsequent trials. Fifteen of the 29 participants in Exp 1 (retrieval phase only) and all of the 17 participants in Exp 3 performed the experiments inside an MRI scanner, where the subjects viewed the back-projected visual stimuli via a mirror system (approx. 20 × 15° of visual angle). The remaining 14 subjects in Exp 1 and all subjects in Exp 2 and Exp 3 performed the tasks in a dimly-lit room, seated approximately 60 cm away from a 19-inch computer screen (also subtending a visual angle of 20 × 15°). Both the MRI projector and computer monitor outside the MRI scanner had a resolution of 1024 × 768 and a 60-Hz refresh rate.Participants within each experiment were tested on the same set of movie frames at retrieval. As illustrated in Fig. 1, frame j is always earlier than frame i for all participants. Further details about the stimuli are summarised in Table 1. Below, we highlight the main differences in the movies, test materials, and retention periods.
Experiment 1
The material for encoding was a 42-min episode of the American TV series “24” (Season 6, Episode 6, 11:00–12:00) dubbed in Italian. No participants in the study had watched this particular Season before. More details about the episode can be found in Kwok, Shallice, and Macaluso (2012). Upon encoding on day 1, participants watched the movie clip in one single session. After a 24-hour retention delay, they were asked to judge the temporal order between two still frames extracted from the film. Each trial was presented side by side for 5 s and was separated by an ITI of 2 s. Participants were also tested on spatial and object memory in two other different retrieval tasks. The results of the other two tasks are reported in Kwok et al. (2012).Frame selection for retrieval was based on detailed content analyses of the movie. The 42-min episode was first divided into 89 epochs, where epoch boundaries signaled change of setting. A hundred pairs of frames were extracted and paired up based on two criteria: (1) the two frames had to be extracted from the same storyline, and (2) the pairings were extracted from two different epochs. The second criterion guaranteed at least one change of setting between the two selected frames. The temporal separation of any pair ranged from 33 s to 31.7 min (see Table 1).
Experiment 2
The movie used for encoding was the same as Exp 1. However, in Exp 2 we reduced the length of retention delay from 24 h to 15 min (reduction of a factor of 96). A new set of 160 pairs of images was created following the same criteria as for Exp 1 to allow us to generalise the results beyond any possible idiosyncrasies due to the choice of specific images. The temporal separation of the pairs in this set ranged from 1.7 min to 34.0 min (see also Table 1). The presentation and the response contingency were the same as Exp 1. The order of the retrieval trials was also randomised across participants. The ITIs followed an exponential distribution that favoured short ITIs (range: 1.0–9.5 s; see below for details on how the retention delay was precisely derived for Exp 1 and 2 for analysis). Relatively long ITIs were used in Exp 1 and 2 to lessen the “carry-over” effect between fast successive retrieval trials; this concern was less acute in Exp 3 and 4 where the retrieval tests were intermixed with video clip encoding periods.
Experiment 3
Unlike Exp 1 and 2, Exp 3 presented edited segments of TV commercials. These commercials are not shown on TV in Italy and were selected using the interface supported by the Advertising Archive company (http://www.coloribus.com/). Using a video editing software (Final Cut Pro, Apple Inc.), we edited these 96 commercials into short clips that maintained a coherent storyline with multiple scene switches. We preserved the background sound and used segments without dialogue. The shortest clip was 7.72 s whereas the longest was 11.4 s, with a mean length of 9.59 s (see also Table 1). These clips were shorter by a factor of about 250 compared to the 42-min episode used in Exp 1 and 2.In order to choose pairs of frames for the retrieval test, we performed a frame-by-frame analysis to mark scene changes for each of the video clips. In each clip we ensured that each epoch contained only a scene change, giving us a number of epochs per clip (range: 4–10). We excluded images from the first and last epochs to avoid primacy and recency effects. We then randomly paired up two images from two different epochs in the remaining epochs in the given clip. This pairing resulted in a sampling of temporal distances ranging from 0.60 s to 4.96 s.Trials in Exp 3 followed the structure: presentation of the movie clip, a short retention delay and the retrieval test. The trials began with 0.5 s of a green fixation cross, which was followed by the presentation of the clip. The screen was blanked out for a variable period of 1.5–2.5 s (sampled from a uniform distribution) before the onset of the retrieval test. Each retrieval trial contained a pair of probe images presented side by side for 4 s, in the same manner described in Exp 1 and 2. Trials were separated by a white fixation cross against a black background, with ITIs sampled from a truncated logarithmic distribution (trial durations ranged between 20.7 and 22.8 s, video presentation inclusive). Subjects were instructed to respond during the 4-s period in a two alternative forced-choice task, as in Exp 1–2.
Experiment 4
Test materials, parameters and response contingency were the same as in Exp 3, but now the retention delay was decreased to a fixed 0.5 s, or reduced by a factor of about 5 (i.e., from 1.5–2.5 s in Exp 3 to 0.5 in Exp 4). The ITIs were constant at 3 s in Exp 4.
Data analysis/calculation
Computation of the similarity scores
For each trial, in each experiment, we combined information about the temporal distance between the two memory probes and the retention delay using the SIMPLE model (Brown et al., 2007). According to this model, where two memory traces (M and M) differ along just a single dimension, a similarity score () can be computed as:where Mlog (T) and Mlog (T), T and T being the times elapsed between encoding and moment of retrieval for item i and j, respectively, and c is a model parameter. The logarithmic transformation of T and T is performed under the assumption that internal psychological magnitudes are logarithmically transformed distances (Shepard, 1987).This transformation allows the similarity of memory traces to be expressed in terms of a temporal-distance ratio (see how T and T are represented on a timeline in Fig. 1). This transformation assumes that no other non-temporal dimensions are involved (see Lewandowsky, Nimmo, & Brown, 2008 for discussion on cases wherein other representational dimensions such as “position” are involved). This similarity can also be equivalently expressed as the ratio of the temporal distances of events from the time of retrieval taken to some power, c (Brown et al., 2007, Neath and Brown, 2006):where T < T. A detailed treatment of the mathematical equivalence of Eqs. (1), (2) can be found in the Appendix of Brown et al. (2007, p. 575).Here, we first computed the similarity scores for each pair of probe images on a subject level. We then derived the constant parameters c for each experiment, based on the range of the similarity scores within each experiment, to bring the similarity ranges across four experiments into a common scale. Specifically, for each trial, we calculated the exact retention delay for a given pair of events i and j (T and T, respectively), that is, the exact temporal distance between when the events in the movie occurred and the moment of retrieval (see Fig. 1). This distance refers to when each event occurred in the movie and when it was again presented in the retrieval test. It should be noted that in Exp 1, 2, and 3, either due to the randomised order of presentation (Exp 1 and 2) or due to the variable delays across trials (Exp 3), the temporal distances for the same retrieval trial (i.e., the same pair of images) were not the same across participants. For each subject within a given experiment, we obtained the largest T and T ratio (corresponding to the shortest separation between events i and j after the longest delay) and the smallest T and T ratio (the largest separation between events i and j after the shortest retention delay) among all retrieval trials. Following Neath and Brown (2006, pp. 205–207), we then determined the constant c in such a way that the ranges of similarity scores were identical across experiments and to maximise this range so as to produce the most representative regression slope. We made the model parameter c inversely proportional to the range of log-transformed T and T to achieve a fixed similarity range across the experiments. The range of similarity scores covered .675 unit of a 0–1 similarity internal scale (see Table 2, also for the minimum and maximum similarity values for each experiment).
Table 2
The ratio between temporal distance and retention delay can be understood as a similarity score. For each experiment, we report the upper and lower bounds obtained from the T to T ratio and subsequently used to determine the parameter c required to attain a common similarity range (.675 unit) across all experiments. In Exp 1–3, the upper/lower bounds and the similarity range varied slightly across participants. We thus used the median values for the upper/lower bounds and the similarity range. In contrast, for Exp 4 the retention delay was fixed at 0.5 s, hence the bounds and similarity were identical for all participants. See Section 3.1. Computation of the similarity scores for detailed explanation.
Keeping similarity difference = .675 unit
Exp 1
Exp 2
Exp 3
Exp 4
Upper bound of similarity scores
0.9794
0.9690
0.7689
0.9008
Lower bound of similarity scores
0.3027
0.2926
0.0933
0.2262
Similarity range
0.6752
0.6764
0.6756
0.6746
Parameter c
55.20
1.13
2.50
0.89
Retrieval response time and temporal distance analyses
Before addressing the main hypothesis concerning the scale invariance of temporal order retrieval across experiments (see next section), we investigated the relationship between temporal distance and retrieval performance separately for the four experiments. Using the temporal distances (TD, in ms) between the two probe images as predictors and the retrieval response times (RT, in ms) as the dependent variable, we performed a within-subject linear regression analysis for each participant. With this we sought to confirm the effect of temporal distance on retrieval times (see Hacker, 1980, Konishi et al., 2002, Milner et al., 1991), but here considering a wider range of temporal distances. Only correct trials were included in these regressions (75–85% of all trials, see corresponding accuracy data in Fig. 2), and trials with responses faster than 200 ms were discarded (<1% in each experiment). Group-level statistical inference was then performed using four separate one-sample t-tests, one for each experiment, that considered mean and standard error of the regression slopes across the subjects.
Fig. 2
Behavioural performance for Exp 1–4. Percentage correct (% correct, bar chart) and retrieval response times (RT in ms, line graph) for correct trials are depicted, arranged by retention delays (“Delay”) and duration and type of material presented at encoding (“Video”). Error bars are standard error of the mean (SEM).
As the slopes of these regression lines only indicate the size of the effect but not how close each data point is to the line, we also performed an additional “by items” analysis that averaged RT across participants for each of the items (i.e., pair of images with a specific TD). We then regressed the average RT against TD and obtained a Pearson’s r for each experiment. Given that the relationship between RT and TD does not necessarily conform to a linear model (Arzy, Adi-Japha, & Blanke, 2009), we fit this relationship also with power, logarithmic and exponential models. Finally, we performed the same “by items” analysis, but using retrieval accuracy rather than RT. For each item, we computed the average accuracy across subjects and regressed this against TD.
Scale invariance: the RT/similarity slopes across experiments
After computing the similarity score of each trial (see above), for each subject we performed a linear regression between the trial-specific RT and the corresponding similarity scores. The resulting RT/similarity slopes for the four experiments were first assessed using a classical one-way ANOVA. Based on the scale invariance hypothesis, for this analysis we predicted a null-result indicative of no difference between the experiments.Next, as an alternative to conventional null hypothesis significance testing (Rouder et al., 2009, Wagenmakers, 2007), we employed Bayesian statistics seeking to confirm the lack of difference between the four datasets. Specifically, we computed Bayes factors that represent the probability of obtaining the observed data under one hypothesis (or model) relative to the probability of obtaining the data under another hypothesis. Here, we compared the full model (H1, cf. standard ANOVA) and the null model (H0). The full model included a predictor with the grand mean (α) and the predictors for Exp 1–3. The predictor for Exp 4 was omitted because of the sum-to-zero constraint. The null model (H0) included only the grand mean α, that is:The Bayes factor BF is the ratio of the probability of the full model H1 against the alternative model H0; that is, BF [H1: H0] = P(D|H1)/P(D|H0), where P(D|H) is the probability of obtaining the data under hypothesis H. Calculation of the Bayes factor involves weighting the likelihood of obtaining the data with a given parameter value by the prior probability of that parameter value. Here we calculated the Bayes factors using two commonly used priors, developed for variable selection within the regression model framework: the Jeffreys–Zellner–Siow “JZS” Prior (Zellner & Siow, 1980) and the Zellner “unit information prior” (Zellner, 1986). These analyses were performed with a Bayesian analysis function for ANOVA designs (Wetzels, Grasman, & Wagenmakers, 2012), implemented in the SPM12 software (http://www.fil.ion.ucl.ac.uk/spm/software/spm12).
Results
Relation between retrieval times and temporal distance
Regression analyses showed a consistent negative relationship between temporal distance (TD) and retrieval times (RT) in all four datasets (Table 3 and Fig. 3A). One-sample t-tests on the slopes of these regressions were significantly for all the experiments, all p < .001 (see full statistics in Table 3).
Table 3
Results of one-sample t-tests of the slopes of RT regressing against TD and the slopes of RT against similarity against zero for each of the experiments. We present the mean with standard error of the mean (SEM), t-statistics, degrees of freedom (d.f.), p-values, effect size in Cohen’s d, and the lower/upper bounds of 95% confidence level of the estimates. These statistics correspond to the results depicted in Figs. 3A and 4A. The effect size Cohen’s d is a descriptive statistics, and refers to the strength of effect; in our case, it refers to the relationship between RT and TD (top panel) or RT and similarity (bottom panel).
Experiments
Mean (SEM)
t-statistics
d.f.
p-value
Effect size
95% confidence interval
Cohen’s d
Lower
Upper
Slope of the retrieval times/temporal distance relation tested against zero
Exp 1
−111.07 (16.59)
−6.69
28
<0.001
−2.53
−145
−77
Exp 2
−169.00 (19.75)
−8.56
14
<0.001
−4.58
−211
−126
Exp 3
−0.27 (0.06)
−4.29
16
<0.001
−2.15
−0.40
−0.14
Exp 4
−0.31 (0.06)
−6.57
14
<0.001
−3.51
−0.51
−0.26
Slope of the retrieval times/similarity relation tested against zero
Exp 1
484 (81.30)
5.95
28
<0.001
2.55
317
651
Exp 2
347 (69.79)
4.98
14
<0.001
2.66
198
497
Exp 3
525 (81.84)
6.14
16
<0.001
3.21
351
698
Exp 4
675 (116.66)
5.78
14
<0.001
3.09
424
925
Fig. 3
Linear regression analysis of retrieval response time (RT) as a function of temporal distance of the events (TD). (A) TD/RT slopes (ms/ms) plotted for individual participants by experiment. Each bar represents the TD/RT slope of an individual; there were 29, 15, 17 and 15 participants in Exp 1–4, respectively. (B) Regressions by item: mean RT (across participants) plotted against the corresponding item-specific TD, separately for the experiments. With this “by item” analysis, there is one data point for each trial (i.e., Exp 1 had 100 trials hence 100 data points, 160 data points in Exp 2, 96 data points in Exp 3 and Exp 4). The RT for each data point were obtained by averaging RTs across participants who responded correctly for that trial.
Additional regression analyses considered item-specific RT (averaged across subjects) and the corresponding TD, again testing for the relationship between temporal distance and retrieval performance separately in the four detests (Fig. 3B). These analyses confirmed the negative relationship between these two measures in all experiments, all p < .05. Moreover, logarithmic and power functions provided a weaker fit to the data, whereas the fitting provided by the exponential function was equivalent to the linear function (see Table 4). Analogous tests using item-specific accuracy (average performance across subjects) confirmed the relationship between retrieval performance and TD. The Pearson’s correlation coefficient were all significantly positive, r(99) = 0.20, r(159) = 0.18, r(95) = 0.34, and r(95) = 0.32 for Exp 1–4, all p < .05, suggesting that the more temporally separated the two events were, the better the discrimination accuracy for that trial.
Table 4
Pearson product-moment correlation coefficients (r) and group mean slopes (β) for regression analyses of the retrieval response times (RT, correct trials only) with temporal distance (TD) as the explanatory regressor in each experiment (“by items” analyses, see Methods). We fit the regression with linear, power, logarithmic and exponential models. All models indicated a negative relation: the longer the TD between the events the faster the RT. The linear and exponential models showed the most significant effects.
Exp 1
Exp 2
Exp 3
Exp 4
r
β (SEM)
r
β (SEM)
r
β (SEM)
r
β (SEM)
RT vs. TD – Linear
−0.23⁎
−111.07 (16.59)
−0.35⁎⁎⁎
−169.00 (19.75)
−0.30⁎⁎
−0.27 (0.06)
−0.31⁎⁎⁎
−0.39 (0.06)
RT vs. log |TD| – Logarithmic
−0.14
−1.35−4 (2.94−5)
−0.24⁎⁎
−1.54−4 (2.36−5)
−0.25⁎
−9.77−5 (2.73−5)
−0.26⁎⁎
−1.45−4 (2.77−5)
Log |RT| vs. log |TD| – Power
−0.15
−0.34 (0.07)
−0.24⁎⁎
−0.33 (0.04)
−0.24⁎
−0.24 (0.07)
−0.26⁎
−0.33 (0.07)
RT vs. exp |TD| – Exponential
−0.25⁎
−2.855 (4.004)
−0.35⁎⁎⁎
−3.605 (3.094)
−0.29⁎⁎
−684.58 (160.21)
−0.31⁎⁎
−891.05 (144.23)
Standard error of the mean (SEM) in parentheses.
p < .05.
p < .01.
p < .001.
Scale invariance: RT/similarity slopes across experiments
First, after combining temporal distances and retention delays in a single similarity score (cf. Section 3.1), we sought to confirm the relationship between this new measure and the retrieval times. T-tests considering separately the four experiments confirmed that the RT/similarity regression slopes were also significant in all experiments, all p < .001 (see Table 3 and Fig. 4A). Additional regressions, now using trial-specific RT and similarity scores, also confirmed the relationship between retrieval times and similarity, all p < .05 (see Fig. 4B). Note that because the similarity scores are specific for each trial and each participant, these regressions did not average RTs between subjects (unlike the “by items” analyses of TD, cf. section above and compare panels B in Fig. 3, Fig. 4). These results demonstrate that the more “similar” the two events to be judged at retrieval, the longer it took to judge their temporal order for all the tested ranges of distances and delays (Exp 1–4).
Fig. 4
RT/similarity linear regression analyses. (A) Box-and-whisker plots show that the regression slopes for the four experiments were all positive and are not different from each other. (B) Regressions by item: individual RT plotted against the corresponding trial-specific similarity score, separately for the experiments. The dashed lines reflect the 95% confidence interval for the true line. Note that the range of similarity scores for each experiment remains precisely at .675 unit (difference between the upper and lower bounds of similarity scores) although the absolute lower and upper bounds do not necessarily match (see also Table 2).
Next, we tested our main hypothesis that this relationship between RT and similarity would be fixed irrespective of range of temporal distances and the retention delays (scale invariance). We assessed this by comparing the RT/similarity slopes across experiments using a one-way ANOVA (data shown in Fig. 4A). We confirmed the homoscedasticity of our data (Levene’s Test for Equality of Variances, p > 0.180). As predicted, the ANOVA did not reveal any difference between experiments, F (3, 72) = 1.78, MSE = 153743.80, p = .158, η2 = .069, suggesting no differences in the quantitative relationship between RT and similarity scores across the different distances and delays.Nonetheless, these results in favour of the null hypothesis are not warranted in the framework of P-value significance testing, where one can only reject the null hypothesis and cannot gain evidence for it (Rouder et al., 2009). Acknowledging the importance of providing evidence for the null hypothesis (e.g., Gallistel, 2009), we performed Bayesian analyses to formally test for the presence of scale invariance. The result from the Bayesian hypothesis test using the JZS prior showed considerable support in favour of the null model H0, BF [H1: H0] = 5.8 × 10−3; the equivalent test using the Zellner g-prior (“unit information prior”) also yielded strong evidence in favour of H0, BF [H1: H0] = 2.4 × 10−3. According to a classification scheme for Bayes factor interpretation (Wagenmakers, Wetzels, Borsboom, & Van Der Maas, 2011), our Bayes factors provide “extreme evidence” for supporting the null hypothesis, and here the presence of scale invariance.
Discussion
We investigated the relationship between the speed of retrieval of temporal order of complex events and the temporal distance between these events. We found a negative linear relationship between retrieval times (RT) and the temporal separation between two events. This held true in four different experiments that included temporal distances ranging from just a few seconds (Exp 3−4) to tens of minutes (Exp 1−2). Next, we also considered the time between encoding and retrieval (i.e., retention delay) and computed similarity scores between probe events for each trial. We confirmed the relationship between similarity and RT, that is, the more temporally similar two events were the slower the retrieval. Based on Bayesian inference, we showed that this RT/similarity relationship held across experiments that included a vast range of temporal distances and retention delays. Our findings support the hypothesis that the retrieval pattern of temporal order information for episodic events is scale invariant.Why is longer time needed for decision when two events are temporally closer than not? The “closeness” of two items along a single dimension determines their confusability and is thought to affect their discriminability (e.g., nominal basis for semantic details, Kelley, Neath, & Surprenant, 2013; perceptual dimensions, Neath et al., 2006; and position dimensions, Surprenant, Neath, & Brown, 2006). In memory for temporal order, one possibility is that events that occurred at proximate locations along a timeline are more confusable at retrieval, thus requiring longer decision times. This pattern is precisely what we observed: when the probe events were close together in the memory set (movie/clips), participants responded more slowly. Furthermore, it should be noted that that participants were not trading RT for better accuracy, as this relationship was found to hold also when considering accuracy. Analogous patterns have been previously observed both in short-term (Hacker, 1980, Konishi et al., 2002, Muter, 1979) and in longer-term temporal order judgement tasks (with >1 day delay; Kwok et al., 2012); and here this was confirmed across four separate experiments that used the same temporal order judgment task but included substantially different ranges of distances and delays.If a phenomenon is scale invariant, it should be impossible to determine the scale at which the phenomenon is being represented. To test our hypothesis that scale invariance governs temporal order retrieval of episodic events, we computed similarity scores (which account both for temporal distances and retention delays; cf. SIMPLE model, Brown et al., 2007) and tested whether the corresponding RT/similarity slopes were the same across experiments. Standard ANOVA analyses did not disclose any significant difference in the RT/similarity slopes. On average, for each unit of similarity increment there was an increase of 485 ms required to judge the temporal order of the two events (Table 3 and Fig. 4A). Inspection of the item-by-item regression lines describing the positive relationship between similarity and RT across trials reveals the hallmark characteristic of scale invariance, with the same pattern appearing at different scales of magnification (Exp 1–4, Fig. 4B). Not disclosing a statistical difference between studies indicates that the variation of distances and delays across the datasets did not affect this particular aspect of retrieval (see also results of the Bayesian analyses).Presently we made use of the similarity ratio concept merely to generate a similarity score that would allow us to compare the four datasets. We did not utilise the “recall probability” function of the SIMPLE model (see p. 544, Brown et al., 2007), which pertains to the predictions for serial position effects. Hence, unlike the modelling of free or serial recall (as shown in the working example in the appendix of Brown et al. (2007)), we expected an “absence of effect” between experiments after transforming the data (see also discussion in Chater & Brown (2008)). However, we note that here scale invariance is a null hypothesis in the sense that it hypothesises an absence of effects (i.e., same RT/similarity slopes across datasets). Accordingly, we ran Bayesian analyses to demonstrate that the retrieval principle involved in the four datasets was indeed the same irrespective of timescales. The Bayes factor is interpreted as the weight of evidence provided by the data. When the Bayes factor for H0 over H1 is over 100, as in the present case (BF [H0: H1] = 1/0.0058 = 172), this indicates that the data are over 100 times as likely to have occurred under H0 than under H1, providing compelling “extreme evidence” that the slopes were indeed invariant across experiments (Wagenmakers et al., 2011).The scale invariance principle can be applied over different memory systems (Neath & Saint-Aubin, 2011), as has been observed in semantic serial position functions during recall of the presidents of the United States (Neath, 2010), the prime ministers of Canada (Neath & Saint-Aubin, 2011), the verses of a hymn (Maylor, 2002), and chronological order of song lyrics (Kelley et al., 2013). These authors argue that semantic serial position patterns arise from the same processing as those observed in episodic memory and that a common explanation might exist for different memory functions that tap into multiple dimensions (e.g., a mixture of episodic vs. semantic details). This implies that qualitatively similar patterns can also be observed when events are linked up along multiple domains.The naturalistic material we used here likely entailed the contribution of other factors than temporal distances and delays. The retrieval tasks taxed memory of details of multiple characters/objects embedded in spaces, with the events evolving coherently over time. In these settings, semantic content may affect the encoding and the retrieval of the episodic events. For instance, Zauberman, Levav, Diehl, and Bhargave (2010) showed a differential time-skewing effect between related and unrelated event markers (related > unrelated; their Study 2). In the present studies we attempted to minimise any confounding effect of event boundaries by ensuring that both image frames presented for retrieval in Exp 1 and Exp 2 belonged to the same, thus semantically related, storyline (note that there are five concurrent storylines in the “24” episode used, see Kwok et al., 2012). This feature ensured that the characteristics of the target event (the earlier one in movie) to be always semantically linked to the foil event (the later one), thereby minimising the time-skewing effect across retrieval trials.Nonetheless, we acknowledge that semantics – and the presence of rich contextual information – could have played a mediating role in the current finding of common retrieval patterns across datasets (i.e., analogous RT/similarity slopes). In Exp 3 and 4, the memory encoding and retrieval alternated on each trial, which is analogous to standard working memory tasks. However, because of the complex nature of the stimulus material, memory retrieval likely involved mechanisms different from those typically associated with working memory for simple displays. The number of items in each of our trials was much larger than the 4 ± 1 items stipulated by STM models (Cowan, 2001). Therefore, our results should not be simply interpreted with respect to classical STM associated limited-capacity buffers (Baddeley and Hitch, 1974, Burgess and Hitch, 1999), which assume that all items are encoded independently (e.g., Baddeley, 2003, Cowan, 2001). Instead, the representation of memory for the naturalistic scenes can be conceptualised as integrating contextual details, and/or summary statistics, or the “gist” of the events (Brady and Alvarez, 2011, Ma et al., 2014). These proposals relying on contextual models align with predictions for delay-independent scale invariant effects in memory functions (Maylor et al., 2001, Öztekin et al., 2010), which may help explain the current finding of common retrieval patterns across long-term memory (Exp 1–2) and short-term memory (Exp 3–4) paradigms.In summary, we showed that the retrieval times required for order judgment increased with increasing temporal similarity between episodic events; and that this held across a wide range of temporal distances between events and retention intervals between encoding and retrieval. By quantifying the temporal ratio between events using a similarity index derived from the SIMPLE model, we confirmed that the RT/similarity relationship was self-similar across different settings. This relationship was scale invariant irrespective of the retention period and the length of movie. Our findings corroborate with the emerging view that analogous memory processes may operate across time scales and that qualitative distinction might not exist between short-term and long-term memory processes (Jonides et al., 2008, Kwok and Macaluso, 2015).
Authors: Ian Neath; Gordon D A Brown; Teresa McCormack; Nick Chater; Roderick Freeman Journal: Q J Exp Psychol (Hove) Date: 2006-01 Impact factor: 2.143