Literature DB >> 33085707

Does the size of rewards influence performance in cognitively demanding tasks?

Joachim A Holst-Hansen¹, Carsten Bergenholtz¹.

Abstract

Classic micro-economic and psychology theories propose different implications of monetary incentives on performance. Empirical studies in sports settings show that athletes generally perform worse when the stakes are higher, while a range of lab studies involving cognitively demanding tasks have led to diverging results, supporting positive, negative and null-effects of higher (vs. lower) stakes. In order to further investigate this issue, we present a pre-registered, randomized, controlled trial of 149 participants solving both anagrams and math addition tasks. We do not find a statistically significant effect of the size of the reward on neither performance, self-reported effort nor intrinsic motivation. We propose that future studies should contrast the potential impact of rewards on different kinds of task, e.g. compare tasks that solely require cognitive effort vs. tasks that require motor skills, as in sports.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2020 PMID： 33085707 PMCID： PMC7577432 DOI： 10.1371/journal.pone.0240291

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

How and if one should provide monetary incentives to individuals to improve their performance is a key question for the scientific disciplines economics and psychology, as well as managers in organizations. A general answer to this question is arguably unattainable, since different types of tasks, timeframes and organizational cultures seem to call for different reward systems [1-3]. When focusing on the nature of the tasks, meta-analyses reveal that in simple tasks, monetary incentives generally improve performance [3, 4]. This claim is in line with the micro-economic perspective, which argues that monetary incentives leads to increased motivation and effort, which in a task that depends on effort, will lead to higher performance [5, 6]. In contrast, a psychologist might argue that if the task is enjoyable, complex or embedded in an organization rather than in an artificial lab-experiment, the micro-economic explanation becomes inadequate, since intrinsic motivations will constitute a stronger influence on behavior, and these monetary incentives could crowd out intrinsic motivations [7]. Because most studies have merely compared offering some monetary reward vs. not offering a monetary reward at all, it is less clear if a higher monetary incentive has a different effect than a lower monetary incentive [5]. A manager can, in principle, modify the size of the incentive, which means the question is both of theoretical and practical interest. The proponents of the micro-economic perspective will argue that higher stakes lead to an even higher incentive to increase the effort, and if the task allows for further increase in effort, higher stakes should lead to a better performance than lower stakes [6]. Yet, too high stakes could also lead to a number of detrimental effects. For one, the opportunity to gain an exceptionally high reward could interfere with the participant’s focus while performing the task [5], which could lead to a decrease of the participant’s performance. Focusing on the magnitude of the potentially high reward might even make participants nervous, thus making them choke under pressure, further reducing the quality of their performance [5, 8]. Second, the relatively high stake might enhance the crowding out effect. If the task invites an intrinsic motivation, a higher reward can reduce the positive effect on effort from this intrinsic motivation. If this reduction is higher than the potential positive effect from a higher reward, the overall effect will be negative [7]. Third, if the given task can’t be solved more efficiently than the level of performance a low monetary incentive allows, a higher stake can’t have an additional positive effect due to a ceiling effect [3]. Data from sports events involving high stakes allows an examination of the impact of higher stakes on performance in real life scenarios. Studies ranging from basketball [9, 10], tennis [11], biathlon [12] to golf [13] find that higher stakes generally reduces performance of the participant. More specifically, in golf the effect leads to golfers putting worse when more money is at stake [13], while higher stakes implied diminished quality of basketball free throws in games compared to training [10] as well as worse free throws in crucial parts of the game [9]. Whereas [14] did not find a general effect in an analysis of penalty kicks in soccer, players did perform worse at home, an effect similar in kind to influence on the performance of biathletes [12]. Yet, two caveats should be added to these kinds of sports data. First, all these studies involve situations of fairly extreme pressure, exceeding the pressure an employee would usually meet in their daily work. Second, these studies do not rely on randomized trials, and hidden confounding factors might be shaping the results; e.g. the size of the audience, self-selection into these sports events or the higher status that beating other competitors implies. Lab-studies allow for randomized controlled trials, where only the size of monetary rewards is varied. In the following we review studies relying on tasks that require cognitive skills and effort while solving some form of problem, thus excluding pure memory tasks [15] and tasks that do not require a cognitive but only a physical effort, such as typing [5]. Paradigmatic tasks relied upon are the Monty Hall problem, probabilistic challenges, anagram puzzles and additive math problems, i.e. cognitively demanding challenges most of which do not require particular experiences or educations. Studies that have asked participants to solve various forms of cognitively demanding tasks have led to ambiguous findings; some studies find a positive effect on performance of higher stakes compared to lower [6, 16, 17], some find a negative effect [5, 18] while others find no clear effect when comparing high vs. low stakes [17, 19–22]. While the findings seem inconclusive, the reason might be systematic discrepancies in the used samples, type of tasks or size of rewards. Yet, these factors do not seem to create clear demarcation lines. A range of studies have relied on a task that required Bayesian reasoning, where some studies have found a positive effect of higher stakes [16, 17], whereas [20] found no effect of increasing the size of the rewards. Note, that while all three studies required Bayesian reasoning, one could argue that the Monty Hall problem used by Friedman [20] should be categorized as an insight problem, rather than requiring continuous probabilistic updating. Furthermore, the exact same task (resembling an IQ test) has been used in Germany [18] and Israel [6]. The latter study offered 0, 0.1, 1, and 3 NIS (Shekel) in addition to a flatrate of 60 NIS, while the former study offered a similar reward structure (0, 1, 5 and 50 Eurocents) in addition to a flatrate of 5 Euros. Gneezy and Rustichini [6] found that the two highest reward levels performed statistically significantly better than the group not paid a piece-rate which in turn was better than the group paid 0.1 NIS as a piece-rate. Yet, in the replication study the very low piece-rate outperformed no piece-rate and high piece-rate, while other differences were not statistically significant [18]. Finally, the size of the reward is also not clearly associated to higher performance. A high piece-rate reward of 50 cents vs. a low reward of 1 cent (factor of 50) did not improve performance in Miller & Estes’ [22] study. In Ariely et al. [5] a factor of 10 generally led to a negative effect, whereas the same factor led to no significant impact on quality in Mason & Watts [19], while a factor of 3 and 1.5 has been shown not to lead to changes in performance [21]. Note that Mason & Watts [19] also examined if a higher reward improved productivity, i.e. how many tasks participants on the platform Amazon Mturk were willing to carry out. Participants thus had the opportunity to continue to solve Mason & Watts’ [19] tasks or go to other, similar tasks, that were paid less/more. We only investigate performance in terms of relative number of correct solutions. Given these inconclusive findings and the fact that relatively few studies showing no effect are available we worry that publication bias might be shaping the nature of the published results [23-25]. In order to contribute to the understanding of the implications of modifying monetary reward sizes, we present a pre-registered randomized controlled trial. We rely on a between-subject design, where primarily Danish participants where to solve some of the same cognitively demanding tasks (additive math and anagram) as in the most cited of all the above mentioned studies on higher vs. lower incentives [5]. Compared to former studies, we employed a medium level difference between higher vs. lower stakes. We expand on the experimental design in the next section. The overall aim is to test if a higher piece-rate reward leads to a different performance level. More specifically, we test if substantially but not radically higher monetary rewards (factor of five, i.e. 2 DKK (0.26 Euros) vs 10 DKK (1.3 euros) for each correctly solved task) lead to a better performance when solving a cognitively demanding, yet relatively simple, task. Since the two overarching theoretical frameworks, micro-economics and self-determination theory as well as the empirical studies outlined above [5, 6, 16–22] offer ambiguous results, we will test the following competing hypotheses. The hypotheses as well as all statistical analysis are pre-registered, unless explicitly labelled exploratory (see https://osf.io/sb6ty/, an English version can be found in S1 Appendix). Hypothesis 1: A larger piece-rate reward will lead to better performance. Hypothesis 2: A larger piece-rate reward will lead to worse performance.

Methods and data

We carry out a pre-registered, randomized controlled trial in order to identify if a higher or lower monetary reward influences the performance level. Participants solved two tasks in sequence, a math addition problem and an anagram. Both tasks are cognitively demanding and requiring substantial and continuous effort, without requiring an advanced education beyond primary school. Our study can be considered a conceptual replication of the aforementioned empirical studies, while being the most similar in kind to [5]; albeit differing in terms of the incentive size, being a between-subject rather than a within-subject study, relying on an increase in the number of tasks each participant engages with as well as the length of the task duration. The study was exempted from ethical approval by a Regional Committee on Health Research Ethics and was approved to be carried out at the university lab by the relevant Human Subjects Committee. Data is available at https://osf.io/fdgms/?view_only=3cc4847e870e417586af6d186881b8da.

Participants

Our sample consists of 149 participants, collected over two rounds with respectively 81 and 68 participants in each round. The latter round was added during the review process to solidify the results and S4 Appendix contains descriptive statistics for each round. This sample size allows the identification of an effect size of r = 0.161 at a two-tailed p-value threshold of 0.05 if relying on a simple linear regression. We note that Funder and Ozer [26] categorize 0.1 as a small effect size and 0.2 as a medium effect size, while Gignac and Szodorai’s meta-analysis [27] shows that 0.2 is a typical effect. Smaller effect sizes can be of interest if they accumulate [26], but in contrast to, e.g., a study on growth mindset [28] one needs to continuously re-invest the higher reward to generate the effect. The effect of rewards will thus not accumulate. We note that Ariely et al. [5] established an effect size of 0.351 for their equivalent math task, when comparing high vs. low rewards. This calculation is based on our re-analysis of their data (the authors kindly responded immediately to our request for their data), using a simple linear regression to assess how many more tasks were solved in the low reward condition. The participants are primarily from Denmark (46.31% listed Danish as their mother tongue), 54.36% were female, and the average age was 25.05 (6.35 SD). These characteristics fit the general distribution of participants in the university’s pool of lab participants. To our knowledge this is the first study of its kind to be carried out in Denmark. Pokorny [18] and Achtziger et al. [17] carried out their studies in Germany, which culturally speaking probably resembles Danish students the most, even though cultural differences between Germany and Denmark exist [29].

Experimental design

Exercises and survey

We exposed participants to relatively simple, cognitively demanding tasks that mainly require effort, rather than a creative insight, without being pure memory tasks; math addition and anagram. We have chosen to employ two different tasks, which forces the participants to utilize different kind of skills, thus improving construct validity. The participants were given 10 minutes to solve two kinds of exercises (20 min in total), which each consisted of 50 tasks. All exercises and surveys were completed on paper. The first math addition exercise (cf. Fig 1) was to find the two numbers that constitute a sum of ten in a box with 12 numbers like below, an exercise identical to study two in [5]. Every box has exactly one solution.

Fig 1

Math addition task.

Example of a math addition task. The left side shows what participants saw, while the right side reveals the correct solution.

Math addition task.

Example of a math addition task. The left side shows what participants saw, while the right side reveals the correct solution. In the next anagram exercise (cf. Fig 2) the participants were given 7–10 letters where the order had been randomized by an internet website [30], which uses the Fisher-Yates-Knuth shuffling algorithm to randomize the order of the letters. The participants had to construct a word in the English language using these letters. All the tasks in this exercise had at least one solution. Some tasks proved to have more than one solution, e.g. the letters “telsetr” can spell both letters and settler. No participants gave more than one answer to any tasks in this exercise.

Fig 2

Anagram task.

Anagram task.

Example of tasks. Note the third task has multiple solutions. Participants only had to find one and were not rewarded for finding more than one. No participants submitted more than one answer to any of the anagram tasks. Achtziger et al. [17] discovered that immediate feedback could play a moderating role, influencing the behavior after receiving the feedback. In our experiment, the participants received no external feedback during the experiment. However, most of the submitted answers from the participants were correct and hence participants will probably have had some idea of their performance during the experiment. The participants were presented the exercises one at a time, and in both kinds of exercises, the participants were required to solve the tasks in the order presented (the tasks were numbered 1–50). The participants were given ten minutes for each of the exercises and were not allowed to transfer time from one exercise to another. The tasks were solved on paper. The average number of tasks solved was 22.51 (9.42 SD), i.e. 22.51% of all tasks. No participant solved all tasks in any exercise, while 2 out of 149 participants solved more than 40 out of 50 math tasks. No one solved more than 40 (out of 50) anagram tasks. Therefore, performance in neither the low or high reward structure seems to face a ceiling effect cf. [3]. The allocated time span differs from the four minutes used by Ariely et al. [5]. We have chosen to engage the participants for a longer duration, since the longer time period can help identify smaller differences. For example, if participant A solves 0.1 task more than participant B per minute, a four-minute trial might end up with them having solved the same number of tasks while a ten-minute trial will reveal the difference. On the other hand, we wanted to avoid inducing cognitive fatigue by having the participants work for hours. The participants were told that a session (including additional surveys) would last forty minutes in total. No sessions exceeded this length, and no sessions were substantially shorter. Furthermore, since [5] relied on a within-subject design, participants ended up spending 16 minutes on the exercises within different incentives system, which is not very different from the 20 minutes our participants spend on solving the exercises. The participants completed the addition exercise first. Hereafter they completed the TIPI personality survey [31] followed by the anagram exercise. Finally, they completed a survey aiming to measure three key variables identified in our literature review, i.e. effort (cf. micro-economics), intrinsic motivation (cf. self-determination theory) and focus [5]. The intention is to capture if these factors are correlated with the size of the reward. The questions are listed in S2 Appendix.

Treatment

The treatment is the size of the reward, where the high stakes scenario involves a reward five times as big as the lower stakes scenario. In previous studies some have relied on much starker differences. Ariely et al. [5] employed a factor of 10 and 100 in some of their experiments, while both Miller & Estes [22] as well as Pokorno [18] relied on a factor of 50. In contrast, Achtziger et al. [17] doubled the reward in the high stakes scenario. Overall, compared to former studies a factor of five is at a medium level. We have tried to balance the dual aims of external validity and avoid the risk that an affect disappears in statistical ‘noise’. A factor of five is somewhat closer to a realistic managerial scenario than if the high reward is 50 or 100 times higher than the low reward. Even if a factor of 50 influenced behavior, it would be unlikely that such an intervention would be economically beneficial to implement. The participants in the condition with a low reward got 2 DKK (0.26 Euros) per correctly solved task, while participants in the high reward scenario received 10 DKK. 30 DKK were added to the money earned from solving the tasks in order to ensure that participants reached the minimum baseline pay of 40 DKK that participants in the lab have to receive to participate in a study. Only 3 participants solved too few tasks and had to be boosted to 40 DKK. The reward was taxable income and reported to tax authorities by the university. The best paid participant received 470 DKK (63 Euros) for forty minutes of effort, which equals an hourly wage of 705 DKK, before taxes. This is a high wage, but it is not unlikely that some of the participants will earn a similar wage at some point in their life, especially considering most of them are university students. The following paragraph is the information regarding reward size given to the participants with the low [high] reward: Your expected earnings for participating in the study are between 40 [. The participants were not informed that the experiment was about pay and performance, and thus were not aware that other participants were exposed to a different reward structure. Participants were given the following piece of information regarding the aim of the experiment when signing up: “The aim of the study is to gain more knowledge as to how performance can be influenced by contextual factors outside the participant’s control.” Deciding not to inform participants about the variation in reward structure also implies that we relied on a between-subject design instead of the within-subject design in Ariely et al. [5]. Earlier research showed that changing reward structure could influence behavior in the second round of exercises [16]. Furthermore, in a within-subject design one risks that participants learn more efficient problem-solving strategies and become better over time. A between-subject design thus reduces the risk of ‘noise’ from learning and switching effects.

Randomization

Two sessions were run on any given day, one starting at 09:00 and one starting at 10:00. The same time of day was chosen since cognitive fatigue can influence students’ performance on standardized tests [32], and therefore running experiments at many different times of the day would induce unnecessary variation. A random number generated by Excel was used to decide if the first session would be run with the low or the high reward. The other session was then run with the other reward structure. No sessions were run Saturday or Sunday, and two days involved only one session, due to lack of participants. The slots were posted on the laboratory’s website for participants and the participants could then choose to participate in the experiment.

Analysis

The analysis has been preregistered (https://osf.io/sb6ty/) following the template from AsPredicted [33]. An English version can be found in S1 Appendix.

Variables in primary analysis

Non self-explanatory variables are described in the following.

Performance (dependent variable)

Performance is the sum of the number of solved tasks in the adding task and the anagram task. The sum is chosen as the dependent variable instead of the two exercises being analyzed separately. This pre-registered approach is selected in order to reduce the degrees of freedom and because the exercises are chosen to test the same thing; influence from size of reward on performance when solving cognitively demanding tasks.

Conscientiousness (control variable)

Conscientiousness has consistently been shown to be correlated with job performance [34], which is why it is included as a control variable, and captured via the TIPI [31]. We selected this short personality inventory due a worry about the potential cognitive fatigue that the long tests might imply and the fact that it is a control variable, not a main predictor. See S3 Appendix for an overview of the questions.

Variables in secondary analysis

In order to further investigate how the reward size might influence behavior, we also assess if self-reported measures of effort [6], intrinsic motivation [7] and focus [5] vary depending on the reward structure the participants have been exposed to. These three variables are described in the following.

Intrinsic motivation and effort

In order to measure intrinsic motivation (cf. self-determination theory) and effort (cf. microeconomics) we have used questions based on IMI [35], but worded according to our setting. For reference, [36] report a Cronbach’s alpha of 0.78 for the interest-enjoyment scale (our measure of intrinsic motivation) and 0.84 for effort. Four questions capture the self-perceived effort that the participant put into solving the exercises, while four questions capture the self-perceived interest and enjoyment (i.e. intrinsic motivation). All questions are listed in S2 Appendix. In order to calculate an intrinsic motivation score, answers to question 2, 4 (reversed), 7 (reversed), and 8 are averaged (cf. the appendix). In order to calculate an effort score, answers to question 1, 3 (reversed), 5 (reversed), and 6 are averaged (cf. the appendix). Since three hypotheses are tested in the secondary analysis, the alpha level is adjusted using Bonferroni’s method. This means that the alpha level is 0.0167. We have tested for difference in mean using Welch’s t-test instead of Student’s t-test. This test’s risk of type 1 error is closer to the significance level when the variances are unequal compared to both Student’s t-test and a choice between Student’s t-test and Welch’s t-test based on a preliminary test for equality of variances [37].

Focus

In order to measure how focused the participants were, three questions have been developed. These are the three last questions from the questionnaire shown in S2 Appendix. These questions aim to capture if the reward distracted the participants in their task solving. The average from these three answers (with the last question reversed) is the focus score for a participant.

Results

Descriptive statistics

Fig 3 illustrates how there were no substantial differences between the individuals in the two conditions. Out of the 149 participants two persons did not inform their age, and two persons did not fill out the survey designed to measure effort, intrinsic motivation, and focus.

Fig 3

Descriptive statistics.

Numbers in parentheses are standard deviations.

Descriptive statistics.

Numbers in parentheses are standard deviations.

Primary analysis

The data has been analyzed using a multiple linear regression, using the size of the reward (low or high) as an independent variable, while gender and conscientiousness have been added as covariates. The only pre-registered dependent variable is the total number of solved tasks, which allows us to investigate if a high or low reward influences performance. Homoskedasticity has not been assumed, and thus robust standard errors have been used. The regression was estimated using OLS with the following regression equation: Fig 4 reveals a p-value of 0.12, and thus no strong support for either a positive or negative effect from a larger reward. However, we note that the confidence interval is rather wide, and a positive effect is still contained in the confidence interval.

Fig 4

Main regression.

Result from main regression.

Main regression.

Result from main regression. We provide an additional, exploratory analysis, since some participants on the first day of sessions (two sessions, 16 participants) did not solve the tasks in the order presented, despite this being noted as an explicit requirement in the written instructions. In the following analysis answers from these two sessions were corrected as if the instruction had not been given. In the remaining sessions this instruction was emphasized, and these answers were corrected with this instruction in mind. We include a dummy for the first day of sessions, in order to control for the difference, which leads to the following equation: The results in Fig 5 are almost identical to the former model, showing a slightly higher p-value (0.16) and very similar confidence interval. Thus, including the dummy does not change the conclusion, and we again cannot reject the null hypothesis that size of reward does not influence performance.

Fig 5

Regression with added covariate.

Result from regression with added covariate.

Regression with added covariate.

Result from regression with added covariate.

Secondary analysis

All three theories identified in the introduction rely on a mediating, explanatory variable. We therefore also analyse if the size of reward influenced effort (mediator in microeconomics), intrinsic motivation (mediator in self-determination theory), or focus (mentioned as a potential meditator by [5]). This allows us to test if we might be unable to measure the effect of reward size on performance, but can identify a change in the level of effort, motivation or focus. The descriptive data only contain 147 observations since two participants chose to not fill out the questionnaire aimed at measuring these things. The data do not show a significant effect from size of reward on either focus, intrinsic motivation, or effort at the 0.0167 threshold. Effort has the lowest p-value (p = 0.09), however the participants who received the low reward exerted more effort according to data–the opposite of the prediction offered by classical microeconomics. The data can be exploratory re-analysed excluding the results from the first two sessions. However, this does not result in a statistically significant difference in any of the tests (see S4 Appendix). We also note that the low Cronbach’s α for the questions measuring focus (0.55, cf. Fig 6) imply the associated t-test should be interpreted with caution.

Fig 6

Focus, motivation and effort related to reward size.

t-tests for equality of means, comparing high vs low rewarded sizes.

Focus, motivation and effort related to reward size.

t-tests for equality of means, comparing high vs low rewarded sizes.

Exploratory analysis

We tested whether the self-reported effort, intrinsic motivation, and focus influenced performance, irrespective of the reward size. Fig 7 shows that both effort (p < 0.01 for both models) and intrinsic motivation (p = 0.018 and 0.020 for model 3 and 4 respectively) are positively and significantly related to performance, as predicted by classical microeconomics and self-determination theory. Participants that performed well thus self-reported higher levels of effort and intrinsic motivation than those that did not do well, corroborating that these factors are indeed important for the tasks. A statistically significant effect of focus on performance could not be detected (p = 0.318 and p = 0.294). However, this might be due to the fact that questions aimed at measuring focus did not fully capture this construct, cf. the relatively low Cronbach’s α. These results are robust to model specifications (see Fig 8 which includes covariates) and the exclusion of data from the first two sessions (see S4 Appendix).

Fig 7

Effort, motivation and focus related to performance.

Relationship between performance and mediators without extra covariates. *, **, and *** mean the variable is statistically significant at the 10%, 5%, or 1% level.

Fig 8

Effort, motivation and focus related to performance, including covariates.

Relationship between performance and mediators with extra covariates. *, **, and *** mean the variable is statistically significant at the 10%, 5%, or 1% level.

Effort, motivation and focus related to performance.

Relationship between performance and mediators without extra covariates. *, **, and *** mean the variable is statistically significant at the 10%, 5%, or 1% level.

Effort, motivation and focus related to performance, including covariates.

Relationship between performance and mediators with extra covariates. *, **, and *** mean the variable is statistically significant at the 10%, 5%, or 1% level.

Discussion

Results from nine former studies on the comparative impact of high vs. low rewards are remarkably mixed, since one can find support for all kinds of effects; positive, negative and no effect. We worry that publication bias might skew the available results, since documenting a null effect is difficult and have a higher risk of not being published [23, 38], while documenting small effects require substantial sample sizes. In our pre-registered randomized, controlled trial we do not find strong evidence that supports higher stakes reduce or improve performance. The absence of evidence is not evidence of an absence of an effect from a higher reward, of course, but given our sample size we could have identified an effect size of r = 0.161 if relying on a simple linear regression (cf. [26]). The correlation between high reward and performance was 0.125 in our study, and with this correlation we would have required a sample size of 247 to obtain a p-value below 0.05, if relying on a simple linear regression. While we cannot rule out the existence of a small positive effect of high rewards, our study reduces the prior belief one should have in the replicability of former, relatively large effect sizes, in either direction. Furthermore, a small non-cumulative effect would generally not be economically efficient to implement for a manager. To illustrate, even if we assume that the difference our data implies are not just random fluctuations, the high rewards merely lead to approximately two extra solved tasks, out of an average of about 22 tasks solved. Yet, a manager would have to pay a factor of 5 to gain such a relatively small improvement. We should add that higher stakes would be economically beneficial for the one receiving the higher reward. Additional analysis supports the interpretation that we do not find strong support for reward structure influencing the participants in our sample. Micro-economics argues that a higher reward should lead to a higher performance since the reward structure should influence the effort, while self-determination theory argues that a higher reward could harm intrinsic motivation. However, the size of the reward did not have a statistically significant effect on neither the self-reported level of effort nor intrinsic motivation, while both effort and intrinsic motivation was significantly positively related to performance. Thus, we do not find strong support for the size of the reward shaping the dependent variable, or the mediating variables they are supposed to influence. Overall, while former studies have shown that a piece-rate payment system generally leads to higher performance compared to flat-rate payments when individuals are solving relatively simple tasks [3], we find no clear behavioral effect from manipulating the reward size. Since our data did not imply a risk of a ceiling effect, one could speculate that getting some kind of monetary reward is a sufficient motivator for participants in a lab-study, while having the opportunity to gain relatively big rewards do not lead participants to choke due to pressure [5]. Yet, a range of studies from the world of sports repeatedly indicate that large reward differences, tend to lead to high stakes reducing performance [9-13]. This discrepancy between sports and cognitive tasks in the lab might simply be the difference between a randomized, controlled trial and observational studies, which cannot exclude potentially confounding factors, such as the public scrutiny and the status competition that sports events also imply. Ideally one would have data on sports events with only a single competitor or without an audience. However, in addition to these confounding factors, we also find good reasons to believe that different theoretical mechanisms are in play in the various settings. All sports disciplines studied require not only a basic cognitive effort (as in the typical lab-task) but also a concentrated, physical motor effort; e.g. when completing a golf putt or a basketball free throw [9, 10, 13]. These various skills arguably have a different evolutionary history: “We are all prodigious Olympians in perceptual and motor areas, so good that we make the difficult look easy. Abstract thought, though, is a new trick, perhaps less than 100 thousand years old. We have not yet mastered it.” [39 p. 2]. One can therefore argue that we should expect performance differences depending on the skills required in the given setting. For example, being able to drive a golf ball relies on motor skills, while a golf putt requires not only motor skills but also a cognitive effort and ability to assess the length of the putt and slope of the green. We consider it a promising avenue for future research to disentangle these variables. We propose that future studies should investigate the contrast between a) simple, cognitively mundane tasks [5], b) insight tasks [40, 41], c) tasks only requiring motor skills (e.g. a golf drive) and d) tasks that require motor skills as well as cognition (e.g. a golf putt). If one relies on a typical pool of lab-participants, one also avoids the potential selection effect of the very experienced sports players that participate in high performance sports. We expect that tasks solely requiring high motor skills should see less of a negative effect of higher stakes compared to tasks that require motor skills as well as cognition. High stakes in insight tasks in particular and cognitively very demanding tasks in general could lead to worse performance levels compared to lower stakes scenarios. This negative effect is argued to be due to reduction in intrinsic motivation, ability to focus and that the reward structure can lead to a more conservative search strategy [1] inhibiting the opportunity to find good solutions to the task at hand. Finally, all these investigations could involve a competitive element, as in sports, or not, in order to further identify the extent of this effect. To sum up, we did not identify an effect of the size of the stake involved, when solving a mundane, cognitive task. We worry that the lack of articles not showing an effect might reflect a file-drawer bias [23, 25]. Only by providing public access to studies that cannot document an effect, can we update the relevant prior one should have concerning the potential impact of large reward sizes [42]. (DOCX) Click here for additional data file. (DOCX) Click here for additional data file. (DOCX) Click here for additional data file. (DOCX) Click here for additional data file. 14 Feb 2020 PONE-D-19-35879 Does the size of rewards influence performance in cognitively demanding tasks? PLOS ONE Dear Mr. Bergenholtz, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. As you can see, the two reviewers make different points in some respect, but I see two things that I consider of high importance. First, it´s the sample size. Both reviewers indicate either directly or rather indirectly, that there actually might be an effect once the sample size is big enough. I agree with this request - if the sample size is just too small to find an effect, your null-results would be an artifact. So I strongly recommend to increase the sample size - at least doubling it, but more would be better. Secondly, Reviewer 2 made quite some comments about your argumentation with respect to what this contributes to the literature and why. Please go over these comments (and all others!) carefully and see whether and how you can address them. Overall, I agree with Reviewer 1 that the paper is in general well written and well organized. So please take care when answering the requests from Reviewer 2 to keep that nice general outline! We would appreciate receiving your revised manuscript by Mar 30 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. We look forward to receiving your revised manuscript. Kind regards, Christiane Schwieren, Dr. Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements: 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.plosone.org/attachments/PLOSOne_formatting_sample_main_body.pdf and http://www.plosone.org/attachments/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please provide additional information about the participant recruitment method and the demographic details of your participants, such as, when relevant, the recruitment date range (month and year), a description of any inclusion/exclusion criteria that were applied to participant recruitment, a table of relevant demographic details, a statement as to whether your sample can be considered representative of a larger population, a description of how participants were recruited, and descriptions of where participants were recruited and where the research took place. 3. Your ethics statement must appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please move it to the Methods section and delete it from any other section. Please also ensure that your ethics statement is included in your manuscript, as the ethics section of your online submission will not be published alongside your manuscript. 4. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. 5. Please ensure that you refer to Figure 1 and 2 in your text as, if accepted, production will need this reference to link the reader to the figure. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Congratulations on a paper that you wrote in a remarkably crispy, clear and concise way. Everything that should be there is there - and nothing more (which is also an art). I likey all about the paper with one exception - the sample size! Why would you rund such a nice experiment with just 81 subjects - funding cannot really be an issue; neither can missing subjects be - on any university campus shoudl you be able to get hundreds of subjects. Why is that important - because a comparison of 40 vs 40 subjects probably MISSES effects that are there - your results deliver a p-value of .33; but properly powered (400 subjects total) you would be able to deliver a much stronger message (be it a significant difference or not). Hence, I suggested accepting the manuscript as it is; but if the other referee(s) or the edior "force" you to increase sample size - do that, it may be worth it. Reviewer #2: This paper considers the issue of whether higher stakes lead to better performance. It is a somewhat controversial topic. But to me the main issue is at the end of the range, with very high or very low incentives. This study has payoffs in one case multiplied by five and I am not sure this captures the issue. But it appears the experiments were conducted properly, so it would appear to be rigorous. I just don’t see what we learn from this study. Abstract, 2nd sentence: Seriously? Then why do teams pay pro athletes so much? Can’t really be true as stated. Introduction, 2nd paragraph: I really don’t agree with this. Higher pay clearly typically leads to higher effort. The issue of interest is really just at the extremes of the pay range. p. 9, middle: These are examples of extreme pressure, not just pressure. And the golf study is just one study. One wonders how many other studies found no effect. p. 9, lower: It's really quite silly to say that payment per se leads to worse performance. p. 9, penultimate full paragraph: I'm not sure this is an entirely fair review of the literature. First line of paragraph before Hypothesis 1: But this may well depend on where in the range you are. Hypotheses 1 and 2: Over the entire range? p. 12, lower: The between- versus within-subject could easily have mattered. Good that you mention this later. p. 13, upper: I don't know that a factor of five will matter in this range for this type of task. p. 14, Performance: This is still dubious. I'd also like to see separate analyses. p. 14, Intrinsic motivation: I’m not really keen on any of these psych measures, but presumably this was stated in the pre-analysis plan. Before Figure 4: But it does in fact look like there is a positive difference. I wouldn’t necessarily be claiming a big victory here. Secondary analysis: 1) I’m not sure how much I trust these self-determination measures. 2) With the Cronbach number so low, I would just ignore this. End of first paragraph of Discussion: I think the results are unclear here. Yes, they are cautionary. First line of penultimate paragraph: I agree with this. It's interesting. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Juergen Huber Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step. 19 Aug 2020 Dear editor We appreciate the constructive reviewer comments as well as your clear and explicit guidance for how to integrate the provided suggestions. Also, we would like to thank PLOS ONE for kindly granting us additional time, in order for us to collect more data. In the following we first address your editorial comments, where after we address each reviewers’ comments one by one. We agree that increasing the sample size greatly benefits our ability to provide stronger evidence for the inferences we intend to draw. Following your recommendation, our plan was to double the sample size. Due to Covid-19 our lab was closed for about 3 months in the Spring of 2020. After reopening, we managed to increase our sample size from 81 to 149, before our university had to close down the lab again, forcing us to cancel the last session we had planned. However, the results remain insignificant. Furthermore, given the current correlation we would need to further increase the sample size substantially, in order to document a significant positive effect of higher stakes; to about 247, as explained in the paper. In other words, the increased total sample size we now rely upon allows us to say that the sample should have shown a correlation of r = 0.16 in order for the results to be statistically significant with a simple linear regression. In other words, we find no significant result, and if there is an effect, it is relatively small – and certainly economically inefficient. We’d also like to emphasize that our sample size is at the high end, compared to the 9 studies we refer to in our theoretical framework. As in other related theoretical fields, effect sizes of the impact of high (vs. low rewards) seem to turn out smaller than original studies indicate. We should also add that the performance in each condition in the newly collected data is almost identical to our former results, indicating that we should not worry that behavior was different in the Summer of 2020, compared to pre-pandemic times (see S4 Appendix for the relevant table). Our country was functioning quite normally in the relevant period, btw. All findings, tables and figures have been updated, integrating the new data and points we make in the above section. In accordance with your suggestion, we have maintained the general outline of the paper, but also adjusted some phrasings to ensure that we write, for example; “we do not find a significant difference between the two conditions”, rather than, for example: “our results show that a higher stake does not lead to increased performance”. In essence, we report a null-finding, yet we also document that given our setup, one requires fairly large sample sizes, if one wants to showcase robust significant results. Response to Reviewer 1 Reviewer comment: Congratulations on a paper that you wrote in a remarkably crispy, clear and concise way. Everything that should be there is there - and nothing more (which is also an art). Reply: Thank you very much for your gracious remarks on our paper! Reviewer comment: I like all about the paper with one exception - the sample size! Reply: We agree on your concern about the sample size and appreciate the opportunity to enhance the strength of our inferences. We therefore planned to double the sample (following the recommendation by the editor). However, due to Covid-19 our lab was closed for about 3 months in the Spring of 2020. After reopening, we managed to increase our sample size from 81 to 149, before our university had to close down the lab again, forcing us to cancel the last session we had planned. However, the results remain insignificant. Furthermore, given the current correlation we would need to further increase the sample size substantially, in order to document a significant positive effect of higher stakes; to about 247, as explained in the paper. In other words, the increased total sample size we now rely upon allows us to say that the sample should have shown a correlation of r = 0.16 in order for the results to be statistically significant with a simple linear regression. In other words, we find no significant result, and if there is an effect, it is relatively small – and certainly economically inefficient. We’d also like to emphasize that our sample size is at the high end, compared to the 9 studies we refer to in our theoretical framework. As in other related theoretical fields, effect sizes of the impact of high (vs. low rewards) seem to turn out smaller than original studies indicate. We should also add that the performance in each condition in the newly collected data is almost identical to our former results, indicating that we should not worry that behavior was different in the Summer of 2020, compared to pre-pandemic times (see S4 Appendix for the relevant table). Our country was functioning quite normally in the relevant period, btw. All findings, tables and figures have been updated, integrating the new data and points we make in the above section. In accordance with your suggestion, we have maintained the general outline of the paper, but also adjusted some phrasings to ensure that we write, for example; “we do not find a significant difference between the two conditions”, rather than, for example: “our results show that a higher stake does not lead to increased performance”. In essence, we report a null-finding, yet we also document that given our setup, one requires fairly large sample sizes, if one wants to showcase robust significant results. Again, thank you for your encouraging comments. Response to Reviewer 2 We would like to thank you for engaging with our paper and challenging our choices and findings. In the following we respond to each of your questions and comments. Reviewer comment: This paper considers the issue of whether higher stakes lead to better performance. It is a somewhat controversial topic. But to me the main issue is at the end of the range, with very high or very low incentives. This study has payoffs in one case multiplied by five and I am not sure this captures the issue. But it appears the experiments were conducted properly, so it would appear to be rigorous. I just don't see what we learn from this study. Reply: Thank you for challenging our approach with this comment. Former studies investigating the impact of higher (vs. lower) stakes demonstrate varying results, even across a range of differences in the size of the stakes. We worry that publication biases might shape what (significant) results are available, and, it is certainly not clear what effect sizes one should expect. We decided on a factor of five (medium difference compared to former studies), in order to balance the likelihood of maximizing the difference (cf. the Maxmincon principle, Kerlinger 2006), and improve external validity. In other words, a very high factor is less likely to be useful to implement in an actual organizational context. Therefore, we think it would be relevant to know, if it is difficult to establish an effect when multiplying the stakes with five, which already can be argued to be at the high end of what would be realistic in an organizational context. However, we agree that investigating the challenge at the end of the range (very high or very low) would be highly interesting as well, for theoretical reasons. Reviewer comment: Abstract, 2nd sentence: Seriously? Then why do teams pay pro athletes so much? Can't really be true as stated. Reply: We agree that the original phrase was unclear and did not convey our intention meaning. We aim to communicate that observational data indicate that increasing the stakes does not seem to enhance the performance, possibly due to choking effects (as Ariely et al. 2009 speculates). The sentence has been rephrased and we appreciate your attentive eye. Reviewer comment: Introduction, 2nd paragraph: I really don't agree with this. Higher pay clearly typically leads to higher effort. The issue of interest is really just at the extremes of the pay range. Reply: In the paragraph in question we do not stipulate that higher pay might lead to higher effort. Rather, we discuss the impact on performance. As mentioned above, empirical studies on performance demonstrate ambiguous effects, some emphasizing choking effects (and thus a negative effect of higher stakes) while others find a positive effect on performance of higher stakes. Incidentally, we could add that our original motivation to engage in this study was Ariely et al.’s (2009) highly cited finding that performance would be lower, if you increase the rewards for completing cognitively mundane tasks. Our results do not corroborate Ariely et al.’s (2009) findings. Reviewer comment: p. 9, middle: These are examples of extreme pressure, not just pressure. And the golf study is just one study. One wonders how many other studies found no effect. Reply: Thank you for making this relevant point; sports are usually about extreme pressure situations. We have rephrased the section to make explicit, that such observational sports studies are looking at more extreme situations than a regular employee typically meets. We also acknowledge that this field could also be influenced by publication bias, hiding no effect findings. Reviewer comment: p. 9, lower: It's really quite silly to say that payment per se leads to worse performance. Reply: We assume the reviewer refers to this part: “[…] some studies find a positive effect on performance of higher stakes compared to lower (Castellan Jr., 1969; Achtziger, et al., 2015; Gneezy & Rustichini, 2000), others find a negative effect (Ariely, et al., 2009; Pokorny, 2008) […]” We consider our choice of words to be in line with the statements made by the original authors who write: “Second, and more importantly, the performance of participants was always lowest in the high payment condition when compared with the low- and mid-payment conditions together.” (Ariely, et al., 2009, p. 458). Furthermore, we also refer to Pokorny (2008, p. 255) “Indeed, subjects who are offered very low incentives perform significantly better than those in the NI and the HI treatments. This provides evidence for a positive effect of very low piece rates on work effort in this context.” Overall, we report the diverging empirical results as well as contrasting theoretical perspectives (micro-economics vs. psychology). We think the fact that empirical results point in different directions and that some imply results that from a micro-economic perspectives appear counter-intuitive, provides further rationale for the relevance of the study. Reviewer comment: p. 9, penultimate full paragraph: I'm not sure this is an entirely fair review of the literature. Reply: We have tried to engage in an extensive search of the literature, including looking at all studies that cite older studies (e.g. Castellan Jr. 1969) or the highly cited studies such as Ariely et al. (2009; 949 citations as of August 2020). We would of course appreciate to be made aware of other, relevant studies. Reviewer comment: First line of paragraph before Hypothesis 1: But this may well depend on where in the range you are. Reply: We agree that it would be interesting to focus on (for example) the very high range, but consider this outside the scope of our pre-registered approach. We have adapted the paragraph right before the hypothesis, to make our intent even clearer. Reviewer comment: Hypotheses 1 and 2: Over the entire range? Reply: As outlined in the comment above, we can only test our pre-registered hypothesis, i.e. comparing high vs. low stakes involving a difference of a factor of five. We have adapted the paragraph right before the hypothesis, to make our intent even clearer. Reviewer comment: p. 12, lower: The between- versus within-subject could easily have mattered. Good that you mention this later. Reply: Thank you for this comment. We agree that this difference is meaningful. Reviewer comment: p. 13, upper: I don't know that a factor of five will matter in this range for this type of task. Reply: We also did not know in advance what impact a factor of five would have for this type of task at this level of reward. The difference in reward is 8 DKK (more than 1 Euro) per solved task, leading to some of the highest performing participants to earn more than 60 Euros for their effort. If we had asked students about the difference – after finishing the session – we guess that they would find the difference between (for example) 15 and 75 Euros important. We hope future research can examine different levels of rewards, in order to clarify if the effect is different for different levels of rewards and different factors between the groups. Reviewer comment: p. 14, Performance: This is still dubious. I'd also like to see separate analyses. Reply: We assume that ‘separate analyses’ mean that there should be two regressions; one with solved adding tasks as a dependent variable and one with solved anagram tasks as a dependent variable. We appreciate the skepticism and interest in seeing a more fine-grained analysis. However, we have prioritized to follow the pre-registered analysis plan, rather than risking finding spurious results by running too many statistical tests. We have made the data available (due to privacy reasons, age has been removed from the dataset) and other researchers can therefore make their own statistical tests on the data. Reviewer comment: p. 14, Intrinsic motivation: I'm not really keen on any of these psych measures, but presumably this was stated in the pre-analysis plan. Reply: These measures are listed in the pre-registration, and hence we have decided to keep them in the article. Reviewer comment: Before Figure 4: But it does in fact look like there is a positive difference. I wouldn't necessarily be claiming a big victory here. Reply: We agree that it is important to transparently document the (in)significance of findings, rather than claiming victories that the data can’t support. We have updated our results following our additional data collection. As you indicate, we did showcase an insignificant difference in the first submission already. The difference is almost identical for the second round of data collection, leading to a smaller yet still insignificant p-value. Given the current correlation between stakes and performance, we would need a sample size of about 247 to identify a significant effect (if relying on a simple linear regression). The effect size is thus relatively small at best, if not random fluctuations. All findings, tables and figures have been updated. We have adjusted some phrasings to ensure that we write, for example; “we do not find a significant difference between the two conditions”, rather than, for example: “our results show that a higher stake does not lead to increased performance”. In essence, we report a null-finding, yet we also document that given our research design setup, one requires fairly large sample sizes, if one wants to showcase robust significant results. In any case, the data are transparently provided and a future meta-analysis can integrate them. Reviewer comment: Secondary analysis: 1) I'm not sure how much I trust these self-determination measures. 2) With the Cronbach number so low, I would just ignore this. Reply: 1) We have updated the Cronbach numbers for our measures, following our additional data collection, and also added a paragraph on the reliability of these scales. 2) The lowest number (focus) increased to 0.55 (from 0.45), while the other remained above 0.70. We agree that the number for focus is very low, yet have chosen to keep them in the article. First, they are described in the pre-registration. Second, Ariely et al. (2009) indicated that the construct could be important, and while our measurement approach might have been ineffective, it can help provide guidance for future researchers’ attempt to measure the construct. Reviewer comment: End of first paragraph of Discussion: I think the results are unclear here. Yes, they are cautionary. Reply: We have deliberately phrased our summary in a cautionary manner (as also exemplified in of our former comments), since demonstrating a null effect is by definition difficult. We do find it relevant to provide evidence for that a positive impact of higher stakes would – if it exists – be relatively small (following reflective guidelines in Funder & Ozer 2018, and Gignac & Szodorai 2016), and would thus appear to be ineffective to implement. Reviewer comment: First line of penultimate paragraph: I agree with this. It's interesting. Reply: We appreciate your encouraging comment. References Kerlinger, F. (2006). Ch4: Research design as variance control, In D. de Vaus (Ed.), Research Design Volume I, (pp. 57-66). London: Sage. Submitted filename: Response to reviewers, final.docx Click here for additional data file. 24 Sep 2020 Does the size of rewards influence performance in cognitively demanding tasks? PONE-D-19-35879R1 Dear Dr. Bergenholtz, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Christiane Schwieren, Dr. Academic Editor PLOS ONE Additional Editor Comments (optional): As you can see from the reviewer comments below, Reviewer 2 is not yet entirely happy with how you handle the issue of the relatively small "large" incentives. From my own reading of the paper, I only partially agree with him. For a student, this is a significantly larger incentive, but, of course, there is still the possibility that it might not be enough to reach "choking". Thus, if you think this might make sense, you could discuss a bit more why you consider a null result for a "not that huge" increase in incentives to be sufficient to say that there probably is no negative effect of "large" incentives. Overall, however, I think you have done this with sufficient care and also did avoid overselling on that matter. Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: (No Response) ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Partly ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: (No Response) Reviewer #2: (No Response) ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Juergen Huber Reviewer #2: No 29 Sep 2020 PONE-D-19-35879R1 Does the size of rewards influence performance in cognitively demanding tasks? Dear Dr. Bergenholtz: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Christiane Schwieren Academic Editor PLOS ONE

14 in total

1. Monetary reward and motivation in discrimination learning.

Authors: L B MILLER; B W ESTES
Journal: J Exp Psychol Date: 1961-06

2. A note on preliminary tests of equality of variances.

Authors: Donald W Zimmerman
Journal: Br J Math Stat Psychol Date: 2004-05 Impact factor: 3.380

3. How to investigate insight: a proposal.

Authors: Hilde Haider; Michael Rose
Journal: Methods Date: 2007-05 Impact factor: 3.608

4. Higher incentives can impair performance: neural evidence on reinforcement and rationality.

Authors: Anja Achtziger; Carlos Alós-Ferrer; Sabine Hügelschäfer; Marco Steinhauser
Journal: Soc Cogn Affect Neurosci Date: 2015-03-29 Impact factor: 3.436

5. Psychometric properties of the Intrinsic Motivation Inventory in a competitive sport setting: a confirmatory factor analysis.

Authors: E McAuley; T Duncan; V V Tammen
Journal: Res Q Exerc Sport Date: 1989-03 Impact factor: 2.500