Literature DB >> 24313425

Strengthening concept learning by repeated testing.

Carola Wiklund-Hörnqvist¹, Bert Jonsson, Lars Nyberg.

Abstract

The aim of this study was to examine whether repeated testing with feedback benefits learning compared to rereading of introductory psychology key-concepts in an educational context. The testing effect was examined immediately after practice, after 18 days, and at a five-week delay in a sample of undergraduate students (n = 83). The results revealed that repeated testing with feedback significantly enhanced learning compared to rereading at all delays, demonstrating that repeated retrieval enhances retention compared to repeated encoding in the short- and the long-term. In addition, the effect of repeated testing was beneficial for students irrespectively of working memory capacity. It is argued that teaching methods involving repeated retrieval are important to consider by the educational system.

Entities: Chemical Disease Gene Species

Keywords: Test-enhanced learning; feedback; long-term retention; memory; retrieval practice

Mesh：

Year: 2013 PMID： 24313425 PMCID： PMC4235419 DOI： 10.1111/sjop.12093

Source DB: PubMed Journal: Scand J Psychol ISSN： 0036-5564

Introduction

Traditionally, in educational settings, tests are used as tools for evaluating students’ knowledge and assigning grades. Another aspect of testing that has largely been neglected by educationalists is its potential to serve as a way of facilitating learning (Bangert-Drowns, Kulik & Kulik, 1991; Butler & Roediger, 2007; Gates, 1917; Glover, 1989; McDaniel, Anderson, Derbish & Morisette, 2007; Spitzer 1939). Findings from empirical memory studies have shown that taking repeated tests before the administration of a final retention test improve the performance compared to traditional restudy of materials, particularly on delayed recall tests (Roediger & Karpicke, 2006, 2006b; Roediger & Butler, 2011; Weinstein, McDermott & Roediger, 2010). One explanation for this beneficial effect is that repeated testing promotes active retrieval of information from memory and offers opportunities for re-encoding of information, whereas traditional restudy-techniques to a large extent rely on repeated encoding (Karpicke & Roediger, 2008). The improvement in subsequent performance after taking one or several tests is known as the testing effect (Kang, McDermott & Roediger, 2007; Karpicke & Roediger 2008; Martinez & Martinez, 1992). This effect has been found to be stable across different kinds of materials, such as paired-associate learning (Carrier & Pashler, 1992), facts (Carpenter, Pashler & Cepeda, 2009; McDaniel, Agarwal, Huelser, McDermott & Roediger, 2011), prose passages (Agarwal, Karpicke, Kang, Roediger & McDermott, 2008; Roediger & Karpicke, 2006b), statistics (Lyle & Crawford, 2011) and learning of skills (Kromann, Jensen & Ringsted, 2009). The majority of previous studies on the testing effect has been conducted in laboratory settings with materials unrelated to real educational purposes (see Roediger and Karpicke, 2006, 2006b for a review). Moreover, few studies have examined the testing effect in actual learning environments with ecological materials and a more realistic retention interval from an educational perspective, that is, longer than one week (see supplementary material in Rawson & Dunlosky, 2011 for a review). Carpenter et al. (2009) investigated the testing effect by assessment of low-stake quizzes in an eighth-grade US history class. After a 9-month delay, the results revealed that significantly more items were recalled that initially had been tested with feedback compared to items that had been studied only or not reviewed. McDaniel, et al. (2007) examined the testing effect during a web-based college-course on brain and behaviour. Students either took weekly quizzes with feedback or reread some critical facts related to the weekly reading assignment. The quizzes were either short-answer (SA) questions or multiple-choice (MC) questions. Approximately five weeks after the last test, students took a final cumulative MC test. The result revealed that prior quizzing, particularly in the form of SA questions, resulted in higher scores at the final test compared to rereading (McDaniel et al., 2007). Thus, the results of a few studies suggest that the testing effect has the potential to facilitate learning in real educational settings. However, it has been stressed that there is a need for additional studies of test-enhanced learning using educationally relevant materials during the progression of a course (Newcombe, 2002; Rohrer & Pashler, 2010). Knowledge of key concepts is important for students in order to gain a better conceptual understanding of the topic at hand. Several studies have shown that learning the meaning of key words is effective for reading comprehension (e.g., Beck, Perfettii & McKeown, 1982; McDaniel & Pressley, 1989). Textbook chapters used in educational settings include important key concepts that are of relevance for the specific topic. Further, lectures are aimed to help the students to orient and get an overview on the topic under study, based on those key concepts. Thus, combining a lecture with computer-assisted learning of key concepts might further improve students’ knowledge level. On the basis of these prior studies, the current study aimed to examine whether repeated testing with feedback, using SA questions, promotes long-term retention relative to rereading of key concepts during the progression of an introductory university course. A key factor in learning studies is the duration of the retention interval. Many studies related to the testing effect have applied the first retention test after two days (Thompson, Wenger & Bartling, 1978) or seven days (see Rawson & Dunlosky, 2011; Roediger and Karpicke, 2006, 2006b; for a review). A typical finding is that restudy of material is beneficial for the performance on immediate tests compared to repeated testing, whereas the opposite pattern is apparent after longer delays (Roediger and Karpicke, 2006, 2006b; Thompson et al., 1978; Wheeler, Ewers & Buonanno, 2003). In contrast to these findings, Carpenter, Pashler, Wixted and Vul (2008) showed in three independent experiments that testing with correct answer feedback was superior to restudy both for a five-minute and a six-week delay, and the effect increased as a function of the number of practice trials. Carpenter et al. (2008) argued that the immediate testing effect might be related to the inclusion of feedback, which differs from the majority of past studies (see Roediger and Karpicke, 2006, 2006b). The results in the Carpenter et al. (2008) study are in line with those from an early study by Thompson et al. (1978), in which it was found that re-exposure (i.e. feedback) of items not recalled improved immediate performance compared to study or test without feedback. In one of their experiments they added a third condition where subjects studied a word list once, were tested by recall, and subsequently re-exposed to those words they failed to recall. The findings, on the immediate test as well as two days later, revealed that more words were retained in the recall plus re-exposure condition compared to both the study and test conditions (Thompson et al., 1978). Despite claims that test-restudy practice produces durable learning (Butler, Karpicke & Roediger, 2008; Cull, 2000), the results of the Carpenter et al. (2008) study also revealed a substantial drop in performance between the 14-day delay and 42-day delay. To further investigate the robustness of testing as a method that reduces forgetting, the present study included additional follow-up sessions 18 days and five weeks after the initial practice. Testing promotes learning in the absence of feedback (Carpenter, 2009; Karpicke & Roediger, 2008), but, as pointed out above, providing feedback seems to further boost learning (Brosvic & Epstein, 2007; Butler et al., 2008; Kang et al., 2007; Roediger & Butler, 2011; Shute, 2008). Feedback may serve two important functions. First, feedback may prevent retrieval failures from being repeated. That is, including feedback may serve as an additional learning opportunity that reduces repeated retrieval failures. Second, including feedback in the form of correct answers prevents erroneous learning to occur. That is, if an item has been erroneously recalled and feedback is not provided to correct the error, that error response has a tendency to be well-stored and repeated later (Roediger & Marsh, 2005). Fazio, Huelser, Johnson and Marsh (2010) examined how different kinds of feedback influenced learning of non-fiction passages. Over three experiments they had participants read passages followed by either no feedback, right/wrong feedback, or correct-answer feedback. Their result revealed that providing feedback in the form of correct answers was most beneficial (Fazio et al., 2010). This is in line with previous research that correct-answer feedback both facilitated learning and improved retention of word-pairs one week later (Pashler, Cepeda, Wixted & Rohrer, 2005). Another important aspect is related to the timing of feedback. If the aim is to support learning, students should receive feedback immediately after their response (Brosvic, Epstein, Cook & Dihoff, 2005; Kulik & Kulik, 1988). In addition, as highlighted by Butler et al. (2007), a crucial aspect of feedback is that it ensures that students process the feedback regardless of their success in recall (Butler et al., 2007). The current study included immediate feedback in the form of correct answers regardless of recall success. In line with prior studies combining testing with feedback (Carpenter et al., 2008; McDaniel & Fisher, 1991; Thompson et al., 1978) it was hypothesized that testing with feedback should lead to better performance compared to the restudy condition both in the immediate test and at the delayed tests. A secondary purpose of the current study was to consider how individual differences in working memory capacity (WMC) relate to learning ability (Alloway & Alloway, 2010; Yuan, Steedle, Shavelson, Alonzo & Opezzo, 2006). This issue has not been examined extensively within the testing-effect domain, and the limited prior findings show mixed results. Agarwal, Rose and Roediger (2010) found support for test-enhanced learning being particularly beneficial for individuals with lower WMC, whereas others have failed to replicate this observation (Brewer & Unsworth, 2012). These mixed results point to the importance of further considering individual differences in WMC when examining the effectiveness of test-enhanced learning, as this might clarify the educational significance of applying different learning methods in the classroom (Dunlosky, Rawson, Marsh, Nathan & Willingham, 2013; Tse & Pu, 2012).

Methods

Participants

Eighty-three undergraduate students registered on a cognitive psychology course participated in the study. The age ranged from 19–44 years (M = 23.8, SD = 3.94). Participation was voluntary and rewarded with 300 SEK. The experiment was administered as an addition to the curriculum. Written informed consent was obtained in accordance with the Declaration of Helsinki, and approved by the Regional Ethical Review Board, Sweden.

Materials and design

The materials used consisted of 57 key concepts from three topics in the assigned cognitive-psychology curriculum; attention, memory, and perception (see Appendix for examples). Each topic contained 19 key concepts, respectively. The key concepts examined were all covered in the assigned readings before the topic lecture (Galotti, 2008). Immediately after the topic lecture, the participants arrived at a computer laboratory. Participants were randomly assigned to either the repeated testing group with feedback (STfb, n = 43) or the restudy group (SS, n = 40). Each of the three learning occasions, one for each topic, included a learning phase followed by an immediate test (five-minute delay). Learning was assessed by means of a test at three different time-points; immediate, an average of 18-days later (range 15–20 days), and in a subsample at a five-week delay. For the five-week delay, 63 (STfb, n = 33, SS n = 30) of the participants were re-recruited. The relatively high number of students not participating at the five week delay was due to the fact that some students had finished their courses and left the campus. The software used for the experiment was E-prime 2.0 (Schneider, Eschman & Zuccolotto, 2002). All items were randomly presented in the center of the screen, both during the learning phase and during the following tests.

Working memory capacity

Working memory capacity (WMC) was assessed with the Automated Operation Span (Aospan; for full task details see Unsworth, Heitz, Schrock & Engle, 2005), which requires participants to remember a series of letters while performing a concurrent task (i.e. a complex working memory task). The Aospan shows good internal consistency (0.78) and test–retest reliability (0.83; Unsworth et al., 2005). This task was administered immediately after the final delayed test was completed. In the Aospan, participants judge whether a math equation yields a true or false answer and then they see a letter to be remembered for later recall. After a series of equation-letter trials, with the set-sizes ranging from 3–7 items, subjects are asked to recall the letters in the correct order. Individuals are encouraged to keep math accuracy above 85% and feedback for math accuracy and letter response are provided. The dependent variable used in the analysis was the total number of correct items in the correct position (Unsworth et al., 2005).

Procedure

Learning phase

Immediately after the topic lecture, the participants arrived at a computer laboratory. Each participant was seated in front of a computer. To familiarize the participants with the to-be-learned material, all participants studied the key concepts at a paper for four minutes before the diverged learning phase started (cf., Experiment 1, Rawson & Dunlosky, 2011). Following these common learning phases (lecture and familiarization), the two groups were assigned to different experimental procedures namely, either rereading the facts (restudy group; SS) or taking a test with feedback (test group; STfb) six consecutive times. For the SS-group, a key-concept was presented (15 sec.), and the instruction only required subjects to study it (e.g., “The primacy effect is the higher level of retention of information presented at the beginning of a list”). For the STfb-group, a key-concept leaving out the keyword was presented (15 sec.) and subjects were requested to type in the correct answer at a blank screen (10 sec). This was followed by feedback in the form of the correct answer (5 sec.). For example; “The improvement in retention of information presented at the beginning of a list?” The participants were requested to type in the correct answer at a blank screen (“primacy effect”) followed by feedback; “primacy effect” (a presentation of the correct response). For both groups, each learning phase contained six learning trials, where each learning session contained 19 randomly presented key concepts within the current topic. There was a two-minute break after the third learning session. During the break, all participants filled in a questionnaire that prompted whether they had read the assigned readings and attended the associated topic lecture. The level of difficulty and the learning procedure were held constant across occasions. Intentional learning instructions were given.

Immediate test

Learning in both groups was evaluated by a test administered after a five-minute break following the final learning session (“immediate test”). During the five-minute break, all the participants answered general questions concerning perceived difficulty and judgment of learning, which prevented the possibility of additional covert retrieval. In the immediate test, participants were presented with a fact (15 sec.) and were requested to type in the correct answer (10 sec.). After they finished the test, the participants were thanked and reminded of the follow-up session. No information was provided that the students were to take the same tests at the remaining two occasions.

Delayed tests

The average retention intervals for the delayed tests were 18 days and five weeks after the initial learning phase, respectively. On the delayed tests, participants completed all three topics within the same session using the same material, procedure, and order as in the immediate test.

Statistical analyses

Statistical analyses were conducted using SPSS 18. The effects on performance of different learning methods were assessed by two different analyses. First, a mixed-model analysis of variance (ANOVA) investigated the effects of topics, time and group. As the five-week delay involved a subsample of the full sample independent t-tests were conducted to investigate sample differences. Partial eta-squared values indexed effect sizes for the F-values. Second, to explore whether individual differences in working memory capacity influenced performance in relation to the intervention, a series of simultaneous regression analyses were conducted at each delay. To be able to examine a moderating effect of working memory on performances we included an interaction term for WMC × Group (McClelland & Judd, 1993). Regression analyses were conducted with the following predictors; Group (SS, STfb) as a categorical variable, WMC as a continuous variable, and the interaction term WMC × Group. Before running the regression analysis we followed the suggestion by Aiken and West (1991) that continuous independent variables included in an interaction term should be mean centered in order to decrease collinearity. The continuous WMC as an independent variable was mean centered before entered in the regression analysis. To control for multiple tests, the alpha level 0.05 was Bonferroni corrected by the number of independent analyses (i.e., p < 0.01).

Results

During the learning phase there was a gradual improvement in performance across the practice trials (Fig.1). A one-way repeated (ANOVA) revealed a main effect of trial, F (3.22, 109.38) = 374.22, MSE = 0.06, p < 0.001, η2 = 0.92, and pairwise comparisons confirmed that there was a significant improvement between each successive practice trial (all p’s < 0.01).

Figure 1

Mean proportion of correct responses for the STfb group as a function of increased number of learning trials. Error bars represents ± 1 standard error of the mean.

Mean proportion of correct responses for the STfb group as a function of increased number of learning trials. Error bars represents ± 1 standard error of the mean. A mixed-model ANOVA with time-point (immediate, 18-days delay) and topic (memory, attention, perception) as within-subject factors, and group (SS, STfb) as between-subject factor was used to analyze whether topics interacted with group. The analysis revealed no significant interaction effects between topic and group, F (2,162) = 1.91, MSE = 0.04, p = 0.15, η2 = 0.02. Also, no significant interaction effect between topic and group was shown at the 5-week delay, F (2,122) = 0.55, MSE = 0.01, p = 0.58, η2 = 0.01. Therefore, the topics (memory, attention, perception) were collapsed into one dependent measure that was used in the subsequent analyses. The collapsed mean proportions of performance for the two groups at the different time-points are shown in Figure2.

Figure 2

The mean proportion of correct responses for the STfb and SS group for the three time-points. Error bars represents ± 1 standard error of the mean.

The mean proportion of correct responses for the STfb and SS group for the three time-points. Error bars represents ± 1 standard error of the mean. A mixed model ANOVA with group (SS, STfb) as between-subjects factor and time of testing (immediate, 18-days delay) as within-subject factor was conducted. Main effects of time of testing [F (1,81) = 451.41, MSE = 0.06, p < 0.001, η2 = 0.85] and group [F (1,81) = 26.52, MSE = 0.02, p < 0.001, η2 = 0.25] were qualified by an interaction between time of testing and group, F (1,81) = 8.03, p < 0.01, η2 = 0.09. The ANOVA revealed that both groups performed at a higher level on the immediate test compared to the 18-day delay test, and that performance was in favor of the STfb group (see Fig.2). As can be seen in Figure2, the absolute rate of forgetting between the immediate test and the 18-day delay is greater for the STfb group (29.6%) compared to the SS-group (22.6%). In line with previous research (Roediger and Karpicke, 2006, 2006b), we calculated forgetting by a proportional measure [(initial recall – delayed recall)/initial recall]. Proportional forgetting showed that the rate of forgetting was quite similar in both groups, STfb = 33.6% and the SS group = 33.1%. As the five-week delayed test involved a subsample (n = 63), a univariate analysis of variance was conducted to examine the effects of long-term retention between the STfb and the SS-group. There was a significant difference between the groups, F (1,61) = 10.55, MSE = 0.03, p < 0.01, η2 = 0.15 indicating that the STf -group performed better than the SS-group (see Fig.2).

The influence of working memory capacity

As the second purpose was to explore the role of working memory capacity (WMC) as a predictor of task performance, regression analyses were conducted for each time-point. Before examining the influences of working memory, a univariate analysis of variance (ANOVA) was conducted to control that WMC was comparable across the two groups [STfb-group (M = 40.81, SD = 16.61) SS-group (M = 36.50, SD = 19.18)]. There was no significant difference in WMC between the groups, F (1, 80) = 1.49, MSE = 162.61, p = 0.23. The intercorrelations among the memory performance at the different time-points and WMC can be seen in Table1.

Table 1

Intercorrelations among memory performance at the three different time-points and working memory capacity as a function of group

Measure	1	2	3	4
1. Immediate test	-	0.75**	0.73**	0.06
2. 18-day delay	0.72**	-	0.93**	0.29
3. 5-week delay	0.75**	0.93**	-	0.36*
4. WMC	0.20	0.17	0.20	-

Notes: Pearson intercorrelations (two-tailed) for the STfb group (n = 43) are presented above the diagonal, and intercorrelations for the SS-group (n = 40) are presented below the diagonal. *p < 0.05 **p < 0.01.

Intercorrelations among memory performance at the three different time-points and working memory capacity as a function of group Notes: Pearson intercorrelations (two-tailed) for the STfb group (n = 43) are presented above the diagonal, and intercorrelations for the SS-group (n = 40) are presented below the diagonal. *p < 0.05 **p < 0.01. For the regression analyses the predictors were entered simultaneously using the enter method. Table2 summarizes the findings of the regression analyses. For the immediate test, Group, WMC, and the interaction term Group × WMC as predictors significantly explained almost 30% of the variance in performance, adjusted R2 = 0.269, F (3,78) = 10.94, p < 0.0001. For the 18-day delay, the model was significant, F (3,78) = 6.72, p < 0.0001 and explained 20.5% of the variance in performance, adjusted R2 = 0.175. Finally, for the five-weeks delay the model remained significant, and accounted for significantly 21.8% of the variance in performance, adjusted R2 = 0.178, F(3,58) = 5.40, p < 0.002. Table2 gives information about the predictor variables entered into the model. As can be seen in Table2, group significantly predicted performance at all time-points whereas WMC did not. Figure3 shows the relationship between performance and WMC at the different time-points for each group. The advantage of the test group was seen across the range of WMC at all retention intervals.

Table 2

Summary of the simultaneous regression analyses predicting the performance at the three time-points with Working Memory Capacity and Group as predictors

Time	Predictors	B	SE (B)	β	t	P
Immediate test	Working memory capacity	–0.001	0.003	–0.073	–0.230	0.819
	group	–0.188	0.035	–0.513	–5.36	0.0001
	Interaction: WMC × group	0.001	0.002	0.205	0.651	0.517
18-day delay	Working memory capacity	0.004	0.003	0.431	1.28	0.202
	group	–0.118	0.032	–0.370	–3.64	0.0001
	Interaction: WMC × group	–0.001	0.002	–0.232	–0.693	0.490
5-week delay	Working memory capacity	0.004	0.004	0.454	1.25	0.215
	group	–0.118	0.040	–0.346	–2.95	0.005
	Interaction: WMC × group	–0.001	0.002	–0.202	–0.560	0.578

Note: Significant outcomes are indicated in bold.

Figure 3

Scatterplots showing the relationship between working memory capacity and mean proportion correct across time, a) immediate test, b) 18-day delay, c) 5-week delay for the different learning conditions.

Summary of the simultaneous regression analyses predicting the performance at the three time-points with Working Memory Capacity and Group as predictors Note: Significant outcomes are indicated in bold. Scatterplots showing the relationship between working memory capacity and mean proportion correct across time, a) immediate test, b) 18-day delay, c) 5-week delay for the different learning conditions.

Discussion

The present study tested the hypothesis that testing with feedback will lead to better performance at longer as well as shorter retention intervals compared to restudy of key concepts. We found that testing with feedback was superior to the restudy condition and that the difference was sustained over all three time-points analyzed. As expected, the testing with feedback condition significantly outperformed the restudy condition also at the immediate test. As far as we know, this is the first study that demonstrates an immediate testing effect using course material during the progression of an on-going course. Carpenter et al. (2008) as well as Kornell, Bjork and Garcia (2011) have shown immediate benefits of the testing effect that are in line with the present results. They, too, used an experimental set-up in which feedback was given, although they did not use educationally relevant material integrated in an on-going course as we did. Kornell et al. (2011) argued that test without feedback creates a bifurcated item distribution in which items that are retrieved are high in memory strength, and items that are not retrieved are low in memory strength. When participants are provided with feedback (i.e. items are restudied), memory strength becomes high enough to surpass a threshold and the information therefore becomes recallable, hence facilitating short-term retention and preventing erroneous learning to occur (Kornell et al., 2011). The results from the immediate test in the current study are in line with this suggestion. The present study included feedback independently of whether the response was correct or not, which possibly resulted in items being above the threshold at the immediate test (i.e., recallable). In addition, the tendency found that tests reduces forgetting (Roediger and Karpicke, 2006, 2006b) was not found in the current study when comparing with the restudy condition. The proportional forgetting between the immediate test and the 18-day delay test was similar in both groups. Why? One possible explanation can be related to the feedback component, which was in the form of correct answer. Speculatively, some of the items might have been recalled by the support of feedback, but those items were close to the recall threshold (Kornell et al., 2011) which made them recallable at the immediate test but forgotten 18 days later. Hence, a suggestion for future studies is to use functional magnetic resonance imaging techniques to investigate the neural mechanisms related to the changes in memory strength as proposed by Kornell et al. (2011). In general, the neurocognitive mechanisms that underlie the benefits of retrieval practice are not well understood (but see Eriksson, Kalpouzos & Nyberg, 2011).The results from the test at the 18-day delay, correspond well with the results from previous studies (Roediger & Karpicke, 2006b; experiment 2; Carpenter et al., 2008; experiment 2). In contrast to the results in the Carpenter et al. (2008; Experiments 1 & 2) study, which showed a decline in performance between later retention tests, the results of the present study revealed that the amount of information retained at the 18-day test and at the five weeks delayed test were comparable in both groups. The interest in the testing effect has partly been driven by its educational relevance. As indicated by the performance at the first test during the learning phase, the result clearly indicates that the students’ initial knowledge about key concepts is low following a lecture. However, when combined with computer-assisted learning the knowledge level improves, particularly when practicing retrieval with feedback. Historically, the idea that retrieving information from memory can increase retention is not new (see Roediger and Karpicke, 2006, 2006b for a review). The benefit of feedback-based learning for retention and shaping of responses is well established within the operant conditioning paradigm (Skinner, 1953, 1958). In the current study, the item-by-item feedback could be considered as a ‘reinforcer’ for learning (Skinner, 1953). However, even if testing with feedback might share some common features with the operant conditioning paradigm it should not be considered as an analogue (Kulhavy & Stock, 1989). The feedback in terms of correct answers, as in the present study, provides a repetitive or corrective feedback which is distinct from the dichotomous verification found in the “yes-no” or “right-wrong” reinforcement paradigm (Kulhavy & Stock, 1989), Recent research on feedback has also suggested that including immediate feedback is particularly important when the initial knowledge level is expected to be low and when the material is considered as complex (Shute, 2008). Thus, despite the longstanding knowledge of the benefits of test-enhanced learning and feedback within psychology, it is neither well known nor commonly implicated as a method to promote learning by students (Karpicke, Butler & Roediger, 2009). Learning is a cumulative process and a main challenge for educators is to apply methods that increase the ability for students to store and retain relevant information over long periods of time (i.e. durable learning). A practical implication of our study is that testing with feedback produces durable learning. A second purpose of this study was to examine how individual differences in working memory capacity (WMC) influenced the ability to benefit from the testing manipulation. The observed non-significance for WMC as a predictor provides no support for the Agarwal et al. (2010) suggestion that testing is especially beneficial for students with lower WMC. Rather, the findings of the present study suggest that repeated testing is beneficial compared to restudy regardless of WMC. Furthermore, across time, our results provided only weak and non-significant support for the “rich get richer” notion, which predicts that individuals with high WMC benefit the most from the testing effect (e.g., Rapport, Brines, Theisen & Axelrod, 1997). It should be stressed, though, that the sample in this study consisted of a fairly homogenous group of undergraduate university students, and may not be representative for more marked interindividual WMC variability in more heterogeneous samples. Longitudinal investigation of larger and more heterogeneous groups with regard to the WMC will be needed to further test the relation between working memory capacity and the magnitude of the testing effect. A limitation with the present study is that it might be argued that the longer exposure duration of the items in the test with feedback condition contributed to the observed group difference. While this cannot be ruled out on basis of the present data, we note that the results from previous studies indicate that additional study opportunities and study time does not eliminate the effect (Carpenter et al., 2008; Karpicke & Roediger, 2008; McDaniel et al., 2007). In addition, the participants could be considered as relatively experienced learners, and it remains to be examined whether the present findings generalize to other populations. In conclusion, by being based on educationally relevant materials the results of the present study extend our knowledge on the effectiveness of testing as a way to strengthen learning. The results show that test-enhanced learning is beneficial over both the short- and long term, at least when applied to the learning of key concepts integrated in an ongoing university course. The study also provides knowledge about how repeated testing and repeated study are associated with working memory capacity, and indicates that testing is beneficial irrespectively of individual differences in working memory capacity. Thus, our results generate novel information on the effectiveness of testing as a learning method and contribute to bridge the current gap between cognitive psychology and educational practice. This research was funded by Umeå School of Education, Umeå University. We would like to thank the students for participating.

25 in total

1. Teaching machines; from the experimental study of learning come devices which arrange optimal conditions for self instruction.

Authors: B F SKINNER
Journal: Science Date: 1958-10-24 Impact factor: 47.728

2. Different rates of forgetting following study versus test trials.

Authors: Mark A Wheeler; Michael Ewers; Joseph F Buonanno
Journal: Memory Date: 2003-11

3. A comparison of study strategies for passages: rereading, answering questions, and generating questions.

Authors: Yana Weinstein; Kathleen B McDermott; Henry L Roediger
Journal: J Exp Psychol Appl Date: 2010-09

4. The effectiveness of test-enhanced learning depends on trait test anxiety and working-memory capacity.

Authors: Chi-Shing Tse; Xiaoping Pu
Journal: J Exp Psychol Appl Date: 2012-07-09

5. The positive and negative consequences of multiple-choice testing.

Authors: Henry L Roediger; Elizabeth J Marsh
Journal: J Exp Psychol Learn Mem Cogn Date: 2005-09 Impact factor: 3.051

6. Test-enhanced learning: taking memory tests improves long-term retention.

Authors: Henry L Roediger; Jeffrey D Karpicke
Journal: Psychol Sci Date: 2006-03

7. Investigating the predictive roles of working memory and IQ in academic attainment.

Authors: Tracy Packiam Alloway; Ross G Alloway
Journal: J Exp Child Psychol Date: 2009-12-16

8. The effects of tests on learning and forgetting.

Authors: Shana K Carpenter; Harold Pashler; John T Wixted; Edward Vul
Journal: Mem Cognit Date: 2008-03

9. Statistical difficulties of detecting interactions and moderator effects.

Authors: G H McClelland; C M Judd
Journal: Psychol Bull Date: 1993-09 Impact factor: 17.737

10. Receiving right/wrong feedback: consequences for learning.

Authors: Lisa K Fazio; Barbie J Huelser; Aaron Johnson; Elizabeth J Marsh
Journal: Memory Date: 2010-04

7 in total

1. Unannounced memory tests are not necessarily unexpected by participants: test expectation and its consequences in the repeated test paradigm.

Authors: Aileen Oeberst; Isabel Lindner
Journal: Cogn Process Date: 2015-06-19

7. Repeated testing improves achievement in a blended learning approach for risk competence training of medical students: results of a randomized controlled trial.

Authors: C Spreckelsen; J Juenger
Journal: BMC Med Educ Date: 2017-09-26 Impact factor: 2.463