Literature DB >> 34138907

Three heads are better than two: Comparing learning properties and performances across individuals, dyads, and triads through a computational approach.

Abstract

Although it is considered that two heads are better than one, related studies argued that groups rarely outperform their best members. This study examined not only whether two heads are better than one but also whether three heads are better than two or one in the context of two-armed bandit problems where learning plays an instrumental role in achieving high performance. This research revealed that a U-shaped correlation exists between performance and group size. The performance was highest for either individuals or triads, but the lowest for dyads. Moreover, this study estimated learning properties and determined that high inverse temperature (exploitation) accounted for high performance. In particular, it was shown that group effects regarding the inverse temperatures in dyads did not generate higher values to surpass the averages of their two group members. In contrast, triads gave rise to higher values of the inverse temperatures than their averages of their individual group members. These results were consistent with our proposed hypothesis that learning coherence is likely to emerge in individuals and triads, but not in dyads, which in turn leads to higher performance. This hypothesis is based on the classical argument by Simmel stating that while dyads are likely to involve more emotion and generate greater variability, triads are the smallest structure which tends to constrain emotions, reduce individuality, and generate behavioral convergences or uniformity because of the ''two against one" social pressures. As a result, three heads or one head were better than two in our study.

Entities: Chemical Disease Gene Species

Year: 2021 PMID： 34138907 PMCID： PMC8211165 DOI： 10.1371/journal.pone.0252122

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Many people believe in the promise of teamwork and synergy, necessitating that the whole is greater than the sum of the parts; “two heads are better than one” or “none of us is as smart as all of us.” However, the number of heads yielding optimal levels of synergy in decision-making remains unknown. While synergies could be generated by knowledge sharing or diverse perspectives, it is difficult to accurately specify the exact factors associated with synergy. Hence, this study addressed this question by taking a computational approach to group decision-making in a simple Q learning model with two options and statistically identified the causes of synergy. Notably, some of the related studies emphasized the significance of team synergy. Surowiecki [1] denoted through numerous case studies that the collective wisdom of a large group of individuals proves to be correct more often than the judgment of a single decision-maker. Nevertheless, this result hinges on the conditions that (1) each person should have private information even if it happens to be an eccentric interpretation of known facts; (2) people’s opinions are not determined by the opinions of those around them; (3) people can specialize and draw on local knowledge; (4) some mechanisms exist for turning private judgments into a collective decision; (5) each person trusts the group as a whole to be fair. If these conditions are violated, a number of dysfunctional dynamics occur, resulting in ineffective group pressure or groupthink. Extensive work by past studies on collective decision-making has reaffirmed the proposition that groups rarely outperform their best members [2, 3]. One robust evidence for the determinants of group performance is the fact that group decisions are governed by a confidence heuristic [4, 5]. This notion implies that group discussions are dominated by the more confident members in a group, and their responses are more likely to be correct than those of less confident members in general [6]. Recently, the ability of groups to combine individual information has been intensively examined through signal detection experiments concerning group decision-making [6-14]. Barami et al. [9] concurred that interactive decision-making between two individuals was better than one when they shared a similar visual sensitivity, and when presented with an equal opportunity to communicate freely. Nevertheless, if two individuals exhibit different visual sensitivities, their performance is generally worse than that of one head. These findings were accounted for by the weighted confidence sharing model in which two heads accurately communicate their level of confidence on every trial. In another study, Bahrami et al. [8] revealed that groups wherein members were heterogeneous in terms of perceptive abilities tend to perform poorly. The lower performance of heterogeneous groups implies that the way the individual information is aggregated is not necessarily efficient. According to Bahrami et al. [9],, this is because groups use a suboptimal decision rule that puts more weight on the information provided by the least able member. Thus, when a greater difference exists between group members’ information reliabilities, the resulting efficiency loss increases. Apart from signal detection experiments, Woolley et al. [15] distinguished collective intelligence that did not strongly correlate with the average or maximum individual intelligence of group members but corresponded well with the average social sensitivity of group members, the equality in distribution of conversational turn-taking, and the proportion of females in the group. Several investigations regarding organizational behavior necessitate the negative aspects of teams, such as group pressure [16-21], risky shift [22, 23], social loafing [24], interpersonal competition [2], and group thinking [25, 26], leading to group collective unintelligence. While the signal detection approach succeeded in formulating rigorously the underlying group dynamics in decision-making, it should be noted that group decision-making in such experiments did not involve intertemporal learning. Instead, they highlighted simple perception problems. Yet, in more complicated problems which do not assume correct solutions, the internal mechanism of sharing information and confidence across interacting members in the WCS does not necessarily lead to higher performance. Instead, sharing consistent learning rules seem more important. Moreover, while the signal detection approach highlighted the problem of individual vs. dyad, the effects of more group size on performance remain to be examined. In particular, this study is interested in examining whether three heads perform better than four or not. Two heads vs. three problem highlights new interesting issues in group decision-making which do not arise in the two heads vs. one, i.e., even-sized groups vs. odd-sized groups [2, 27]. Small groups are likely to break into two coalitions. If a group is even-sized, two subgroups are equal in size. In this case, since the majority rule cannot be applied, subgroup dynamics might lead to deadlock [28-31]. In contrast, if a small group is odd-sized, a minority and a majority subgroups emerge, and the majority influence provides a clear direction and group cohesion [2, 27, 32, 33]. In our context, this argument could be related to the coherence of group learning. Because majority rule cannot be applied to dyads, decision making in learning situations may eventually become incoherent. In one moment, one member may make a decision based on her learning preference, and in another moment, another member may take initiative in decision making. As a result, group learning strategy is likely to become incoherent over time. In contrast, because majority rule can be applied to triads, majority subgroups may make decisions based on their own learning strategies; thus, group learning strategies may be more coherent. Consequently, triad learning performance may outperform dyad learning performance due to the former’s learning coherence and the latter’s learning incoherence. Notably, both triads and individuals can pursue coherent learning strategies; hence, relative performance may not be predicted in advance without imposing further conditions. However, we may predict that an inverted U-shaped relationship in learning performance emerges across indivisuals, dyads, and triads. S1 Appendix presents a simple model of individual and triad learning coherence and dyad learning incoherence. Thus, this study’s main hypothesis was that a U-shaped relationship emerges across individuals, dyads, and triads because learning coherence is more likely in individuals and triads and learning incoherence is more likely in dyads. To test this hypothesis, this study arranged several experimental settings for small groups who predominantly relied on online face-to-face communication. Most of the members were not acquaintances and were communicating with each other for the first time, which controlled for the effects of group pressures and secured psychological safety. This stratagem was viable because participants did not have to worry about any personal relationships, and focused their attention on group tasks. Under this controlled environment, this study attempted to identify how group decision-making differs from that of individuals in terms of performance and learning properties such as the exploitation/exploration ratio. This study ran experiments conducted by individuals, dyads, and triads so that the effects of group size from one to three could be evaluated. Hence, this study could test the efficacy of both two heads and three heads in comparison to one. Furthermore, this study did not depend on signal detection experiments as this study was more interested in the learning properties of group decision-making in contrast to information sharing and filtration. Thus, this study adopted a reinforcement learning (RL) framework [34] to account for decision-making and learning behaviors in the two-armed bandit (TAB) problems, which is the standard model for model-based analysis of choice behavior. The RL framework has been extensively studied in the context of multi-armed bandit problems, in particular, closely associated with neural signals in cortical and subcortical structures [35-38]. Moreover, the RL framework has also been adopted to study learning behavior in many social contexts [39-45]. Nevertheless, to the best of our knowledge, this framework has not been applied to the study of group decision-making. One advantage of taking this computational approach is that learning parameters could be estimated as groups and also compared across and within groups of different sizes. This new approach to group dynamics allowed us to rigorously estimate and characterize the properties of group dynamics.

Methods

Participants

The experiment in this study was implemented in one of the undergraduate courses the author taught at Kobe University. Initially, a total of 336 students participated in the experiment for course credit, but 14 participants were excluded before the analysis because they took only one of the three tests in this experiment. As a result, the sample in this study consisted of a total of 322 healthy undergraduate students (i.e., 100 females, age range = 19–25 years, SD = 1.21). All participants and their academic advisers signed informed consent before the experiment, which was approved by the local Ethics Committee at the Graduate School of Business Administration, Kobe University.

Experiment

In Test 1, each participant undertook the TAB independently. In Test 2, participants formed a pair and undertook the TAB. In Test 3, participants played the TAB in groups of three. In total, this study conducted seven sessions of the TABs using the online communication software, Zoom. In each session, three tests were randomly assigned to participants using the breakout sessions in Zoom to control for learning effects. All tests were performed with PsytoolKit [46, 47]. Group members in Tests 2 and 3 freely communicated via Zoom during the session, while sharing test screens in PsytoolKit, and made choices. Participants were required to complete the tests within 40 minutes. Most of the groups finished the tests within 30 minutes. Additionally, there was a one-week interval between the two successive sessions. All participants in this sample undertook Test 1 once and at least one more experiment in Tests 2 or 3. In Test 2 (dyads), 230 and 23 participants played the TAB once and twice, respectively. In Test 3 (triads), 153, 66 and 13 participants played one, two, and three times, respectively. Since Test 3 required more participants than Test 2, the number of those who participated in the entire experiment, more than once, were more than those who participated in Test 2. Random assignment of participants mitigated their learning effects across Tests 2 and 3. For example, participants might have experienced two rounds of Test 3 first, and then, participated in Test 2. In this case, their learning might be carried over to the last experiment. By randomizing the order of assignment to experiments, these effects were expected to be mitigated in the pooled sample. The total numbers of groups of dyads and triads were 138 and 108, respectively. For the purpose of group comparison, we also constructed a subsample in which at least one member of the dyadic and triadic groups undertook Test 3 and Test 2, respectively. In total, 262 individuals were in this subsample. In this subsample, 116 dyadic groups and 93 triadic groups were identified. In Tests 2 and 3, the numbers of individuals who took the tests one, two, three, and four times were 84, 118, 49, and 11, respectively. Of the dyads, 216 individuals participated, and 200 and 16 undertook Test 2 once and twice, respectively. Of the triads, 205 individuals participated, and 142, 52 and 11 individuals undertook Test 3 one, two and three times, respectively.

Two-armed bandit problem

In the TAB problem, participants complete a series of 100 choices from two boxes. In each choice, participants selected either a right or left box, and immediately after clicking, the reward appeared (Fig 1). The reward was either 10 points or 0 for a choice and participants are required to maximize the total rewards out of the 100 choices.

Fig 1

Two-armed bandit problems (TAB).

Two-armed bandit problems (TAB).

Participants chose and clicked either on the right (blue) or left (green) box. Immediately after clicking, the reward of either +10 points or 0 points appears. In this figure, the left box is selected first with 10 points, followed by the left and right boxes with the rewards of 0 points and 10 points, respectively. Participants undertake this selection 100 times to maximize total rewards. One of the boxes was advantageous with a 70% success probability and the other was disadvantageous with a 30% success probability. Alternatively, after a certain number of trials, advantageous and disadvantageous cards switched to disadvantageous and advantageous ones, respectively. During the 100 trials, these switches were designed to intervene three times, the timing of which varied across experiments. For instance, in a few experiments, the switches took place on the 30th and 70th trials such that the right card was advantageous for the first 30 trials, but became disadvantageous from 31st to 70th trials, and reverted to being advantageous in the last 30 trials. The timings and success probabilities were not known to participants. These settings were designed because learning convergence was likely to be achieved in the first 30–60 trials in our past experiments. Once the convergence was achieved, participants only selected the same box afterwards, which in turn biased the estimates of learning parameters.

Q learning model

This study considered two types of Q learning models [48] to estimate learning parameters in the TAB. First, in the simple Q learning model, the action value Q(t) of the chosen option i at trial t is updated as: with where R(t) and α(0<α≤1) are the reward of the option i at trial t and the learning rate, respectively. δ(t) refers to the reward prediction error, measured by the difference between the current value estimate and the obtained reward R. The action value reflects immediate reward by scaling the prediction error with the learning rate. If learning rates are close to 1, fast adaptations are made based on prediction errors. If learning rates are closer to 0, adaptation becomes very slow. The initial action values are set to zero so that Q(1) = 0 for i = 1,2. For the unchosen option j (i≠j), the action value remains the same as before: Denote the chosen action at trial t by a(t)∈{1, 2}. The probability to choose either option is specified via the following softmax decision rule: where P(a(t) = i) indicates the probability to choose the action a(t) = i at trial t. The parameter β refers to the inverse temperature, a parameter that assesses the relative strength of exploitation vs. exploration. Exploitation indicates “the optimization of current tasks under existing information and memory conditions”, while exploration signifies “wider and sometimes random searches and trials that do not coincide with the optimal solutions provided by the exploitation” [49]. A higher β value suggests that the choices are to be made primarily based on the action value Q, implying exploitation. Contrarily, a lower β value indicates more random choices, regardless of the action value Q, implying exploration. This is because the relative importance of the Q value in Eq (4) declines significantly. Hence, the inverse temperature β refers to the relative weight of exploitation against exploration in decision-making.

Asymmetric Q learning model

This Q learning model assumes that learning parameter α was assumed to be symmetric, regardless of the sign of the reward prediction error δ(t). However, related studies showed that the learning rates were asymmetric [50-55]. Thus, the asymmetric Q learning model was considered, incorporating asymmetric learning parameters. In this model, the action value Q(t) of the chosen option i is updated via the following rule: where α+ and α− are the learning rates when the reward prediction errors are positive (or zero) or negative, respectively. The idea behind this specification is the positivity biases. Cazé and van der Meer [56] showed that even in simple, static bandit tasks, agents with differential learning rates can outperform unbiased agents. They suggested the existence of a situation in which the steady-state behavior of asymmetric RL models yields better separation of the action values compared with symmetric RL models [56]. While this proposition was proved mathematically as an asymptotic property, real performance in cognitive tasks includes not only asymptotic properties but also transient outcomes [57]. ϕ is added here as the choice trace to account for autocorrelation of choice, which could affect the learning biases [57]. For the unchosen option j (i≠j), the action value is updated according to Eq (3), and the probability to choose the option is computed via the softmax decision rule in Eq (4).

Estimation method

The parameters clarified in the model were estimated by optimizing the maximum a posteriori (MAP) objective function, that is, finding the posterior mode: where p(D|θ) is the likelihood of data D for subject s, and conditional on parameters θ = {α, ϕ, β} for the simple Q learning model and for its asymmetric version. p(θ) is the prior probability of θ. This study assumed each parameter is bounded and uses constrained optimization to find the MAP estimates. More specifically, since α (α±) is bounded between 0 and 1, and β, take non-negative values, their priors were assumed to follow beta distributions for α (α±), and gamma distributions for β.

Results

To reveal the group effects, group differences across individuals, dyads, and triads and effects were examined, with respect to performance and learning parameters of the two Q learning models. After these analyses, the performance determinants were evaluated by regression analysis. The descriptive statistics for relevant variables were reported in Tables 1 and 2.

Table 1

Descriptive statistics (pooled sample).

	Individuals		Dyads		Triads
	Mean	SD	Mean	SD	Mean	SD
Performance	49.57	5.28	48.41	5.24	50.36	7.06
max	-	-	52.37	4.22	53.95	3.99
min	-	-	46.71	4.53	45.54	4.19
average	-	-	49.54	3.75	49.86	3.26
(simple model)
α	0.57	0.25	0.59	0.26	0.6	0.24
max	-	-	0.7	0.19	0.79	0.14
min	-	-	0.41	0.24	0.33	0.18
average	-	-	0.55	0.19	0.56	0.14
Β	4.35	3.39	5.18	4.2	6.61	4.97
max	-	-	6.09	3.62	7.36	3.4
min	-	-	2.77	2.37	1.74	1.29
average	-	-	4.43	2.7	4.3	1.88
(asymmetric model)
α⁺−α⁻	0.08	0.35	0.04	0.36	0.11	0.35
max	-	-	0.28	0.28	0.38	0.2
min	-	-	-0.11	0.3	-0.21	0.28
average	-	-	0.09	0.25	0.09	0.2
β	4.35	3.02	4.81	3.34	5.77	3.71
max	-	-	6.08	3.23	6.69	2.59
min	-	-	2.78	2.19	1.95	1.42
average	-	-	4.43	2.39	4.18	1.59

N = 568.

Table 2

Descriptive statistics (subsample).

	Individuals		Dyads		Triads
	Mean	SD	Mean	SD	Mean	SD
Performance	49.86	5.3	48.12	5.07	50.41	7.39
(simple model)
α	0.57	0.25	0.58	0.26	0.58	0.24
β	4.4	3.41	5.18	4.32	6.59	5.03
(asymmetric model)
α⁺−α⁻	0.1	0.35	0.03	0.36	0.11	0.35
β	4.32	2.94	4.78	3.45	5.51	3.63

N = 471.

N = 568. N = 471.

Group differences

Performance

For the performances of the TAB, the number of gaining 10 points throughout 100 trials was used because the subsequent analysis applied the Poission regression dealing with counting data. The total sum of rewards is obtained if this performance measure is multiplied by 10. The average performance for individuals, dyads, and triads was respectively 49.57, 48.41, and 50.36. The Kruskal-Wallis test revealed the significant group size effects on performance ( = 7.21, p = .03). Then, the pairwise Wilcoxon Rank-Sum Test with Bonferroni adjustment presented significant differences in performance existing between individuals and dyads (p = .06) and between dyads and triads (p = .05). Thus, the performance was higher for individuals and triads, and lowest for dyads, which suggested that three heads are better than two and one head is better than two. However, the data included groups whose members did not undertake other tests. For example, in some dyadic groups, some members did not complete Test 3 and in some triadic groups, some members did not complete Test 2. If these samples had been included, group comparisons between dyads and triads might have been imprecise; in other diadic and triadic groups, some group members completed both Tests 2 and 3. Hence, a similar analysis was conducted in the subsample of dyadic and triadic groups in which at least one group member undertook all tests. The average performance for individuals, dyads, and triads was respectively 49.86, 48.12, and 50.41. The Kruskal-Wallis test showed the significant group size effects on performance ( = 11.45, p = .003). The pairwise Wilcoxon Rank-Sum Test with Bonferroni adjustment presented significant differences in performance existing between individuals and dyads (p = .003) and between dyads and triads (p = 0.03). Thus, the results became more significant, showing a U-shaped relationship across individuals, dyads, and triads in this subsample. If one head is better than two heads, it should naturally follow that two heads are better than three. However, in our experiments, three heads proved superior to two. This result was opposite to that presented by Bahrami et al. [9] who found two heads being better than one. This contrast could be accounted for by the differences in underlying cognitive tasks between the two studies. In Bahrami et al. [9], careful detection of oddballs was required whereas in our study, decision-making, given past performances, was given prime significance. In other words, the former cognitive task hinges on attention, but the latter requires further information processing.

Learning parameters

Consequently, the question arises that how were these results (i.e., three heads better than others and one head better than two) generated in group dynamics in Q learning tasks. The group dynamics, including group pressure and risky shifts, are generated as participants share experiences by working together. Gradually, group members begin to mutually exercise personal influences; however, in our study, such group dynamics could not take place by controlling amounts of communication across participants. Instead, group dynamics matter in this study with respect to learning related to the TAB. To inspect this mechanism, learning rates α, positivity biases α+−α−, and the inverse temperature (exploitation/exploration ratio) β were estimated and compared across individuals, dyads, and triads. The pooled sample was considered first. In the simple Q learning model, the Kruskal-Wallis test showed that no group difference exists for α ( = 1.52, p = .47), but the inverse temperature generated group differences ( = 15.09, p = 5.3e-04). The pairwise Wilcoxon rank-sum test with Bonferroni adjustment presented significant differences existing between individuals and triads (p = 3.3e-04) and between dyads and triads (p = .05). In the asymmetric Q learning model, the positivity biases α+−α− were confirmed in individuals ( = 20.42, p = 6.2e-06) and triads ( = 12.53, p = 3.0e-04), but no posivitity biases were found in dyads ( = 2.18, p = .14). In addition, β showed the group size effects ( = 6.98, p = .03). The pairwise Wilcoxon rank-sum test with Bonferroni adjustment presented significant differences between individuals and triads (p = .02). Next, the subsample of groups in which at least one member completed all three tests was considered. In the simple Q learning model, the Kruskal-Wallis test indicated that no significant group differences existed for α ( = .22, p = .90), but the inverse temperature generated group differences ( = 10.95, p = .004). The pairwise Wilcoxon rank-sum test with Bonferroni adjustment presented significant differences between individuals and triads (p = .0003) and dyads and triads (p = .09). These results were quite similar to those in the pooled sample. In the asymmetric Q learning model, the Kruskal-Wallis test confirmed positivity biases α+−α− in individuals ( = 24.75, p = 6.5e-07) and triads ( = 11.19, p = 8.2e-04), but not in dyads ( = 1.29, p = .26). In addition, β showed the group size effects ( = 6.76, p = .03). The pairwise Wilcoxon rank-sum test with Bonferroni adjustment presented significant differences between individuals and triads (p = .03). Once again, the results remained similar to those in the pooled sample.

Within-group effects

Next, the effects for dyads and triads were examined. To see the group effects, the maximum, minimum, and the average of group members’ individual performances and learning parameters were compared with the corresponding group variables. For example, the average of group members’ individual performances was compared with the corresponding group performance. If the latter is higher, this suggests group effects are positive. The analysis was conducted with respect to the pooled sample because the comparison was made within each group, rather than across groups. In dyads, the maximum and average of individual performances outperformed group performance ( = 43.45, p = 4.4e-11 for the maximum and = 5.49, p = .02 for the average), but its minimum underperformed group performance ( = 6.51, p = .01). Thus, group effects were positive in improving the minimum performance of group members, but it did not surpass the maximum and average performance of group members. In triads, while the maximum of individual performances outperformed group performance ( = 29.87, p = 4.6e-08), its minimum version underperformed group performance ( = 36.47, p = 1.6e-09). However, the average of individual performances did not outperform group performance ( = .06, p = .81). This suggests that in triads, group effects were higher in the sense that triads achieved higher performance than the minimum of individual performances and did not underperform compared to the average of individual performances. In dyads, the learning parameters of α in the simple model and the positivity biases α+−α− in the asymmetric model showed similar patterns in which the group parameters outperformed their minimums of individual group members ( = 34.96, p = 3.4e-09 for α, = 11.09, p = 8.7e-04 for α+−α−) and underperformed their maximums ( = 9.78, p = .002 for α, = 29.74, p = 4.9e-08 for α+−α−). However, while the group parameter of α outperfomed its average ( = 4.08, p = .04), that of α+−α− was not statistically different from its average ( = 1.56, p = .21). Regarding the inverse temperature, β, the group parameters outperformed the minimums of individual group members ( = 21.50, p = 3.5e-06 for the simple model, = 25.06, p = 5.6e-07 for the asymmetric model) and underperformed their maximum ( = 7.00, p = .01 for the simple model, = 10.14, p = .001 for the asymmetric model); however, they did not differ from their averages ( = .08, p = .78 for the simple model, = .17, p = .68 for the asymmetric model). Thus, group effects were mostly identified because group parameters surpassed the minimums of the corresponding individual group members. However, the group effects were not high enough to outperform the maximums of group members. In dyads, the inverse temperatures, β, in both models and the positivity biases in the asymmetric model neither outperformed nor underperformed their averages. Only the learning parameter α outperfomed its average. In triads, all parameters, except α+−α− in the asymmetric model, showed a similar pattern in which the group parameters outperformed their averages and the minimums of individual group members but underperformed their maximums ( = 4.46, p = .03, = 60.26, p = 8.3e-15, = 40.65, p = 1.8e-10 for the average, minimum, and maximum of α, = 5.19, p = .02, = 69.18, p = 2.2e-16, = 4.27, p = .04 for the average, minimum, and maximum of β in the simple model, = 7.36, p = .01, = 65.24, p = 6.6e-16, = 5.32, p = .02 for the average, minimum, and maximum of β in the asymmetric model). Meanwhile, the maximum and minimum of α+−α− in the asymmetric model respectively outperformed and underperformed the group parameters ( = 31.61, p = 1.9e-08, = 43.08, p = 5.2e-11 for the maximum and minimum); however, the average of α+−α− was not statistically different from the group parameters ( = .64, p = 0.42). Hence, both in dyads and triads, all group learning parameters outperformed the minimums and underperformed the maximums of group individuals. The difference emerged with respect to averages. In most cases, group parameters outperformed averages. However, the group parameters of the inverse temperatures in both simple and asymmetric models in dyads and the positivity biases in dyads and triads were not statistically different from their averages. Thus, differentiating factors in terms of within-group effects between dyads and triads were identified in the inverse temperature in both simple and asymmetric models; triads outperformed, but dyads did not surpass their averages.

Determinants of performance

The results on performance suggest that a U-shaped relationship exists between group size and performance. Besides, β seemed to account for higher performance. To examine this more rigorously, this study regressed group dummy variables (Individuals, Dyads, Triads) and the learning parameters on performance. For this group comparison, the regression analysis was based on the subsample. The results in the simple and asymmetric models are respectively shown in Tables 3 and 4.

Table 3

Determinants of performance (simple Q learning model) (SE in parentheses).

Variables	Performance
	(1)		(2)		(3)
Constant Terms	46.58	***	48.46	***	54.26	***
Constant Terms	(1.06)		(0.91)		(2.57)
α	1.13		1.13		1.13
α	(1.29)		(1.29)		(1.29)
β	0.17	**	0.17	**	0.17	**
β	(0.08)		(0.08)		(0.08)
Individual	1.88	**
Individual	(0.78)
Dyad			-1.88	**
Dyad			(0.78)
Triad	2.03	**	0.15
Triad	(0.98)		(0.87)
Size					-7.75	***
Size					(3.01)
Size squared					1.96	**
Size squared					(0.77)
AIC	3006.4		3006.4		3006.4

N = 471. The dependent variable is performance. Individual, Dyad, and Triad are dummy variables for individual, dyad, and triad. Size is the number of participants, which is 1, 2, and 3 for respectively individual, dyad, and triads. Since it takes only non-negative counting values, the Poison regression was applied to achieve statistical consistency.

** and *** Symbols indicate p < .05, and p < .01, respectively.

Table 4

Determinants of performance (asymmetric Q learning model).

(SE in parentheses).

Variables	Performance
	(1)		(2)		(3)
Constant Terms	46.58	***	48.50	***	54.39	***
Constant Terms	(0.82)		(0.62)		(2.50)
α⁺−α⁻	-0.70		-0.70		-0.70
α⁺−α⁻	(0.94)		(0.94)		(0.94)
Β	0.34	***	0.34	***	0.34	***
Β	(0.10)		(0.10)		(0.10)
Φ	0.01		0.01		0.01
Φ	(0.02)		(0.02)		(0.02)
Individual	1.92	**
Individual	(0.79)
Dyad			-1.92	**
Dyad			(0.79)
Triad	2.05	**	0.13
Triad	(0.98)		(0.87)
Size					-7.87	***
Size					(3.03)
Size squared					1.99	**
Size squared					(0.78)
AIC	3002.5		3002.5		3002.5

** and *** Symbols indicate p < .05, and p < .01, respectively.

Determinants of performance (asymmetric Q learning model).

(SE in parentheses). N = 471. The dependent variable is performance. Individual, Dyad, and Triad are dummy variables for individual, dyad, and triad. Size is the number of participants, which is 1, 2, and 3 for respectively individual, dyad, and triads. Since it takes only non-negative counting values, the Poison regression was applied to achieve statistical consistency. ** and *** Symbols indicate p < .05, and p < .01, respectively. According to the tables, overall, Individuals and Triads positively, but Dyads negatively, accounted for performance. Moreover, while the coefficient of Size was negative, the Size squared was positive in both models, attesting the existence of the U-shaped relationship between group size and performance. Regarding learning parameters, the inverse temperature β in both models significantly accounted for performance. Thus, Individuals, Triads, and β were the determinants of higher performance in the TAB.

Model fits

Finally, to compare the two models mentioned above, WBICs [58] was calculated for both models. The average WBICs were -49.11 for the simple model and -50.58 for the asymmetric model. The difference was tested and the result indicated no significant difference between the two (T(1134) = 1.34, p = .18), implying that the two models cannot be differentiated statistically in the pooled sample. In each group, WBICs were also calculated. In individuals, the average WBICs of the simple and asymmetric models were -50.97 and -52.64, respectively, and no statistical difference was identified (T(642) = 1.19, p = .23). In dyads, the average WBICs of the simple and asymmetric models were -48.94 and -50.35, respectively, and once again, no statistical difference was identified (T(274) = .62, p = .54). Similarly, in triads, the average WBICs of the simple and asymmetric models were -43.75 and -44.72, respectively, and no statistical difference was identified (T(214) = .36, p = .71). Therefore, the two models cannot be statistically differentiated. Nevertheless, the fact that both models generated a high correlation between inverse temperatures and performance and similar patterns in within-group effects confirmed the robustness of the results with respect to model specifications.

Discussion

One of the interesting findings in our study was that a relationship between performance and group size was validated to be U-shaped. As the regression analysis revealed, the causes for this performance difference could be attributed to higher values of the inverse temperatures β in both models. In dyads, group effects regarding the inverse temperatures in both models did not generate higher values to surpass their averages, which might lead to lower performance. In contrast, triads gave rise to higher values of the inverse temperatures than their averages of group members. These differences are responsible for the U-shaped relationship in performance. Although the model selection tests did not differentiate between the simple and asymmetric Q learning models, both shared the same results that the inverse temperature β accounted for higher performance. Thus, our results are robust to model specifications. At individual levels, participants were more likely to perform the two-armed bandit game in an exploratory manner because their inverse temperatures were relatively lower to dyads and triads. The emphasis on exploration at individual levels indicate that rationality in terms of exploitation in the framework of the underlying learning model increased as more group members were added to the group decision-making processes. To achieve agreement in groups, logical reasoning and persuasion based on rational calculation would be required instead of exploration. Yet, in dyads, this increase in exploitation was not sufficient to make it significantly different from individuals. Indeed, group effects could not generate higher values of β than its averages. It could be inferred that dyads encountered learning incoherence, leading to smaller group effects regarding the inverse temperature. According to Simmel [59], in dyads, social interaction is more personal, involving more affect or emotion, and generates greater variability. The negative aspect of social interaction seemed to appear in dyads in our experiments. On the other hand, Simmel [59] argued that triads are the smallest structure that tends to constrain emotions, reduce individuality, and generate behavioral convergences or uniformity because of the ‘‘two against one” social pressures. These forces form the basis for uniformity, emergent norms, and cohesion [60]. Consequently, while dyads failed to improve the inverse temperature beyond its average as a result of affective or emotional influences, the smallest social structure, in the form of a triad, improved efficiency due to social pressures and more exploitation. This is also consistent with the theoretical hypothesis in S1 Appendix where dyads are likely to adopt more randomized learning strategies, whereas individuals and triads adopt coherent learning strategies. Although individuals might use more exploratory behaviors, exploration itself is one of the coherent learning strategies. Hence, our empirical results support our hypothesis that learning incoherence takes place in dyads but not in triads. Notably, the positivity biases were confirmed for individuals and triads, but no such learning biases existed for dyads. As related studies indicated [50-55], learning biases are more likely in such leaerning situations. This result further evidences learning coherence in individuals and triads and learning incoherence in dyads. Apart from this main result, the fact that group parameters achieved higher values than its means of individual members in most of the learning parameters deserves some attention in its own right. Not only triads, but also dyads, had these positive effects. Future studies should explore these group effects in more detail. However, our findings are subject to several limitations. First, the results critically depend on the tasks that the groups perform and the learning situations where the TAB games are played. Different game settings could lead to different results. Second, learning properties could change over time through learning, therefore, their reliability might be subject to some limitations. Performance probably changed as participants undertook more TAB games, because of the stochastic nature of the rewards. However, it could be conjectured that its learning strategy tends to be relatively stable because participants could not fully detect the stochastic environments (i.e., which options are more likely to generate higher rewards), as the probability of obtaining higher gains was changed twice during the 100 trials. Hence, it seems that participants were less likely to change their learning strategies even when they undertook the TAB several times. This justifies the use of learning properties in this study. Nevertheless, the reliability of learning properties should be tested in a future study. Third, although this study used a relatively large sample, different results could be found in different samples, in particular, in different cultural contexts. For example, Shen et al. [61] noted that, when examining the effects of risk-taking on convergent thinking, they found that risk-taking was negatively associated with convergent thinking in China, but these correlations were close to zero or negative in the Netherlands. Thus, cultural effects could alter the learning strategies in the TAB, and hence, the effects of group dynamics on group performance. Despite these limitations, the findings in this study deserve some attention because previous studies did not evaluate and examine the effects of group dynamics in terms of learning properties. Moreover, the results are intuitive and consistent with the simple hypothesis that the U-shaped relationship with respect to performance emerged due to the coherence of learning strategies. Even though these results might not be supported in different experimental settings; our computational approach could still be applied and is expected to generate new results. Thus, the contribution in this study would be more methodological. This study encourages future research that examines the learning mechanism of group dynamics, according to the computational approach suggested in this study. (DOCX) Click here for additional data file. 19 Jan 2021 PONE-D-20-35580 Three heads are better than two: Comparing learning properties and performances across individuals, dyads, and triads through a computational approach PLOS ONE Dear Dr. Harada, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Two experts in the field reviewed your work. I also read the paper myself, as I find this an intriguing topic, related to my own research. The reviewers seemed to find the topic of your study interesting, and perhaps intriguing; however, they both note major issues with the paper that will need to be addressed, if this paper is to reach the bar for publication. It seems a major issue is to improve the modeling work and data analysis to where it is consistent with similar work in the reinforcement learning and mathematical modeling literature. You also need to better explain the key points and main goals for your study. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols We look forward to receiving your revised manuscript. Kind regards, Darrell A. Worthy, Ph.D Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1.) Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2.) Please change "female” or "male" to "woman” or "man" as appropriate, when used as a noun (see for instance https://apastyle.apa.org/style-grammar-guidelines/bias-free-language/gender). 3.) For this single-authored manuscript, please replace "we" with "I". 4.) We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide. 5.) Please include your tables as part of your main manuscript and remove the individual files. Please note that supplementary tables should be uploaded as separate "supporting information" files. 6.) We noticed you have some minor occurrence of overlapping text with the following previous publication(s), which needs to be addressed: - https://www.emerald.com/insight/content/doi/10.1108/S0065-2830(2010)0000032004/full/html - https://www.sciencedirect.com/science/article/abs/pii/S0022249616301523?via%3Dihub - http://www.wjh.harvard.edu/~cfc/Publications.html? - https://www.sciencedirect.com/science/article/abs/pii/S0049089X13000884?via%3Dihub In your revision ensure you cite all your sources (including your own works), and quote or rephrase any duplicated text outside the methods section. Further consideration is dependent on these concerns being addressed. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: No ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Summary The paper examines the question whether cooperative decision making can improve performance in 2-arm bandit task (TAB). Interactive decision making experiments are conducted online using ZOOM meeting software. A variant of Q-Learning is implemented to examine learning parameters (total reward, risk seeking, loss aversion, action selection noise) across the three conditions. Results are complex and do not present a straightforward interpretation by themselves. The discussion is, correspondingly, unclear and offers little clarity. In the spirit of paper’s main research question, a colleague of mine and I reviewed the paper separately and exchanged comments to arrive at a joint review. The paper has proposed a good question and used a remarkably inventive methodology to implement joint decision making in the time of pandemic. The paper could potentially make a valuable contribution. But there is a long way to go. Major comments There is no theory. We do not know what to expect and WHY to expect it. Benefit in joint decision making is not the default (eg many works in joint Memory show that 2 people are worse than one). Could the author provide a formal model of dyadic and triadic performance that could produce some anticipated outcomes? In this regard, a very relevant paper to consider is Migdal et al (J Math. Psych. 2012). The introduction discusses previous findings and theoretical claims regarding collective cognition and group intelligence. However, very little is written about reinforcement learning in general, and Q-learning in particular. Relevant issues include the reasons for choosing to focus on learning (as opposed to perceptual decision-making in previous research), the ecological validity of this task, and what we already know about Q-learning in individuals. Moreover, the hypotheses and predictions should be fleshed-out and justified in the introduction. It is currently unclear what the hypotheses are. One can imagine that under a reasonable model of individual performance, it should be possible to simulate the dyadic and triadic decisions and have some predictions that could be directly compared to the data. At present, the only motivation to do the experiments seems to be to see “What could happen” and that can certainly be improved up on. There is no model comparison. We wondered to what extent the result depend on the exact implementation of Q-learning used in this study. It would be more convincing to show that the general pattern of results is consistent across a family of Q-learning models. For example, Equation 1 includes a constant noise that is not always present in Q-learning models. This equation also involves two learning rate parameters, one for negative and one for positive prediction errors. Do the results also hold in models with one learning rate, and without random noise? In a similar vein, do they also hold in a model that uses raw rewards, rather than prospect-utility-transformed values? Moreover, it can be illuminating to show potential differences in the best-fit model between the group sizes. For example, did a model with 2 learning rates fit the data better than a single-alpha model in all group sizes? This could be interesting, since it could be that not all group sizes weigh gains and losses differently (or equally). The design is unclear More clarity about design is needed. All subjects did the individual condition. Some did the dyadic (130 dyadic groups) and some did the triadic (110 triadic). These numbers show, and the paper indeed indicates that some participants did all three conditions but we do not know how many they were. Did any participants take part in more than 1 dyadic or triadic group? All of these issues make a big difference to establishing the right baseline. For example, when we see the comparison of total collected reward in Figure 2, it makes more sense if the dyadic condition is compared to individual performance of the subjects who took part in dyadic condition and exclude subjects who did not. A similar issue applies to triadic condition which should have its own individual control. The above would then allow the analysis of performance to be made not only between groups each group and the average of its individual but also between group and the best participant within a group. Also, if possible, it would be advisable to try out comparing triads to the best dyad pairing within a group (see Wahn et al., 2018, PLoS one, for a similar approach studying visual search). These analyses will clarify whether the group-size differences reflect statistical aggregation or genuine group dynamics. Such an analysis should not only be made on the level of overall performance, but also on the level of model parameters. In other words, understanding the relationship between individual-level and group-level model parameters can illuminate the dynamics of group learning. But since there is a discrepancy between the number of people who took the dyadic and triadic conditions, I am not sure how feasible this option is. More minor points: 1. p. 2: “causes of synergy” -> the term causes is too strong. Correlates? Factors associated with? 2. p. 4: “To convert the group from…”: sentence unclear. Please elaborate. 3. The author refers to a single individual as a group. This is a strange decision. It is more natural to use the terms “individual” for N=1, and “group” for N>1. Accordingly 4. P. 6 and onward: It is better to regard the study as comprised of a single, multi-condition experiment, rather than 3 different experiments. 5. P.7: how were the number of trials per run (before the probability reversals) distributed? How were the reversal points determined? 6. Procedure: who was the participant that responded in each trial? For example, did the participants take turns, or was the decision determined by the first participant who answered? 7. P. 10: what were the specific parameters of the distributions that were used as priors for parameter estimation? 8. Given the impressive sample-size used in this study, it would be very valuable to provide the readers access to the raw data, as well as the analysis code. 9. P. 11 and onward: exact p-values should be given for all statistical tests (significant or not). 10. P. 12, top: the difference between individuals and dyads does not meet the standard .05 criterion. The same is true for p. 14, top. However, I do not think that the Bonferroni correction is needed when having only 2 comparisons, so that the uncorrected p-value can be used in the former case. 11. P. 13, top paragraph: mode details should be given regarding the parameters that did not vary across group sizes (i.e., descriptive and inferential statistics). 12. A section on parameter recovery is missing. It is important to show the precision in which the parameter values could be recovered using the number of trials and participants used in this study. This can strengthen the claims regarding group-size invariance in some of the parameters. 13. Figures 2-4 do not provide much more information than is already given in the text, and hence should be omitted. 14. The bar graphs used to show the data are quite outdated compared to what is acceptable and standard practice these days which includes superimposing the data points on top of the bars and/or showing the distributions using violin plots and similar tools Reviewer #2: This is an interesting study with (to my knowledge) a novel finding. In this study people performed a two-armed bandit task. They did this first individually, and later in a separate session they performed the same task as in groups of two or three people. The results suggest that groups of two people did worse than either individuals or groups of three people. This is a good experiment and the main behavioral result is interesting and appears to be novel. But, there are some serious issues with the paper. It is not ready for publication as it is. In particular many aspects of the modeling are unclear. Without more detail it is not clear whether the model is appropriate, or whether it accurately characterizes the data. Equation 1, phi is never explained. What does it represent and what function does it play in the model? It is also listed on pg. 10 as one of the parameters, but nothing is said about its prior. I am also confused about the learning rate alpha. I’m guessing that the plus/minus superscript means there are separate learning rates for when the prediction error is positive or negative, but this should be stated explicitly (or explained if it is doing something else). Also, alpha has a t subscript which implies it is dependent on the trial somehow, but if so, it is never explained. I think that equation 3 is supposed to specify that you use the top part if R(t) is greater than 0, and the bottom part if it is less than 0. But, on that note, the experiment only ever has positive rewards, so the bottom half of the equation would never be used. This means that with the present design the parameter v is never used and serves no purpose. It also means that (because there are never losses) this study cannot assess risk attitudes, so all sections of the paper related to risk are not valid. This is a major issue for the interpretations of the paper. It also calls into question whether the model is fitting well, and therefore whether analyses of other parameters (like inverse temperature) are meaningful. The priors should also be better specified. The type of distributions are noted, but the parameters of those distributions are not mentioned. Overall, the model as it was applied does not appear to be appropriate, and a number of parts of it are not adequately explained. It is also a relatively complicated model for a somewhat simple task—which is not necessarily a problem, but some of the choices need to be justified--such as using different learning rates for positive and negative prediction errors. Results of the model fit also needs more detail (i.e., fit statistics) so that we can assess how well it is characterizing the data. --------------------------------------------------- Other issues and typos: The paper implies that triads do better than individuals, but the difference is not significant, so the authors need to be more careful about how the results are presented. I noticed this in the abstract, but it might say it elsewhere as well pg. 8 – the text jumps into explaining the modeling in the section that explains the task. It should probably be its own section top of pg. 4 'put' should be 'puts' pg. 5 "single" maybe should be "signal" pg. 13. first sentence of 2nd paragraph. 'the' should be 'a' ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 3 Mar 2021 We greatly appreciate reviewers’ helpful comments and we believe we could significantly improve the quality of the paper. Here are how we revised the manuscript in response to the comments. The editor requested “It seems a major issue is to improve the modeling work and data analysis to where it is consistent with similar work in the reinforcement learning and mathematical modeling literature. You also need to better explain the key points and main goals for your study”. We believe by responding to reviewer’' requests and comments, these challenges were satisfied in the current revised manuscript. Main changes First, the main revisions in the current manuscript are as follows: - In response to the comment by Reviewer 1 that no theory was presented, we described a very simple model in Introduction that analyzed the performance differences across individuals, dyads, and triads, leading to the U-shaped relationship, which was caused by the coherence and incoherence of learning. - In response to Reviewer 2’s comment on the appropriateness of the Q learning model and suggestions to simplify the model and Reviewer 1’s inquiry into whether the results change or not when only one learning rate is introduced, we considered two models: (1) simple Q learning model with only one learning rate; (2) asymmetric learning model that only allowed for changing learning rates to the sign of reward prediction errors. The results in both models were the same: inverse temperature accounted for performance. - Following the advice by Reviewer 1, we examined within-group effects in which group learning parameters and the maximum, average and minimum of its individual group members’ learning parameters were compared. - Following the request by Reviewer 1, in analyzing group differences, we examined not only the pooled sample, but also the subsample in which all group members experienced both dyads and triads experiments. This subsample consists of 161 individuals, 56 dyads, and 42 triads. Although a few differences emerges between the pooled and subsample, main results remained the same. Responses to Reviewer 1: 1. There is no theory. We do not know what to expect and WHY to expect it. … Moreover, the hypotheses and predictions should be fleshed-out and justified in the introduction. It is currently unclear what the hypotheses are. -->We introduced a simple model of learning of individuals, dyads, and triads, leading to the U-shaped relationship of group performance in the Introduction. This model follows the arguments of Simmel and proposition on even-sized groups vs. odd-sized group differences. We interpret these as learning coherence of odd-sized (individuals and triads in our manuscript) and learning incoherence of even-sized (dyads). The hypothesis was the U-shaped relationship across individuals, dyads, and triads, and learning inefficiency in dyads, which we believe were shown by our analysis. 2. However, very little is written about reinforcement learning in general, and Q-learning in particular. Relevant issues include the reasons for choosing to focus on learning (as opposed to perceptual decision-making in previous research), the ecological validity of this task, and what we already know about Q-learning in individuals. --> In Introduction, we added “Thus, this study adopted a reinforcement learning (RL) framework (34) to account for decision-making and learning behaviors in the two-armed bandit (TAB) problems, which is the standard model for model-based analysis of choice behavior. The RL framework has been extensively studied in the context of multi-armed bandit problems, in particular, with close association with neural signals in various cortical and subcortical structures that behaved as predicted (35-38). Moreover, the RL framework has also been adopted to study decision-making and learning in various social contexts (39-45). Nevertheless, to the best of our knowledge, this framework has not been applied to the study of group decision-making. One advantage of taking this computational approach is that learning parameters could be estimated as groups and also compared across and within groups of different sizes. “ 3. There is no model comparison. … Do the results also hold in models with one learning rate, and without random noise? In a similar vein, do they also hold in a model that uses raw rewards, rather than prospect-utility-transformed values? Moreover, it can be illuminating to show potential differences in the best-fit model between the group sizes. --> In the previous model, we only considered only one variant of Q learning model. In this revised version, we considered (1) simple Q learning model with only one learning rate; (2) asymmetric learning model, and compared the model fit by calculating WBIC in the subsection “Model fits” in Results. The statistical test did not differentiate between the two models. 4. The design is unclear. … some participants did all three conditions but we do not know how many they were. Did any participants take part in more than 1 dyadic or triadic group? … it makes more sense if the dyadic condition is compared to individual performance of the subjects who took part in dyadic condition and exclude subjects who did not. A similar issue applies to triadic condition which should have its own individual control. --> Regarding the questions, we described “In test 2 (dyads), 23 participants played the TAB twice. In test 3 (triads), 72 participants played twice, and 14 participants played three times.” in Methods. In group comparison, we also considered the data in subsample in which all participants participated in dyads and triads, and compared the performance and learning parameters across three groups. 5. Also, if possible, it would be advisable to try out comparing triads to the best dyad pairing within a group. ...Such an analysis should not only be made on the level of overall performance, but also on the level of model parameters. --> Following this, we compared group performance and learning parameters and the max, min, and average of individual members’ performance and learning parameters. The results were described in the subsection Within group effects in Results. Minor points “causes of synergy”　⇒ factors associated with synergy “To convert the group from…”: sentence unclear. ⇒ Deleted the sentence. It is better to regard the study as comprised of a single, multi-condition experiment, rather than 3 different experiments. ⇒We rephrased “test 1”, “test 2”, and “test 3”, instead of experiment 1, 2, and 3. P.7: how were the number of trials per run (before the probability reversals) distributed? How were the reversal points determined? --> Although we misunderstood this question, it was described that in one run, the probability reversals took place in the 31th and 71th trials. For the first 30 trials, the probability remained the same. Then, the probability was reversed, and up to 70th trial, the probability remained the same. The second reversal took place at the 71th trial, and the subsequent trials retained the same probability. Please let us know if this explanation (and that in the article) is still ambiguous. Procedure: who was the participant that responded in each trial? For example, did the participants take turns, or was the decision determined by the first participant who answered? --> In each trial, group members discuss and decide jointly. They communicated via breakout sessions in the Zoom. We described “Group members in tests 2 and 3 freely communicated via Zoom during the session, while sharing test screens in PsytoolKit, and made choices.” in Experiments in Methods. what were the specific parameters of the distributions that were used as priors for parameter estimation? --> We described in Methods: “More specifically, since α (α^±) is bounded between 0 and 1, and β, take non-negative values, their priors were assumed to follow beta distributions for α (α^±), and gamma distributions for β.” Given the impressive sample-size used in this study, it would be very valuable to provide the readers access to the raw data, as well as the analysis code. --> Yes, we will upload the raw dataset immediately after the acceptance. P. 11 and onward: exact p-values should be given for all statistical tests (significant or not). ⇒ P values were added in all test results in the text. P. 12, top: the difference between individuals and dyads does not meet the standard .05 criterion. The same is true for p. 14, top. However, I do not think that the Bonferroni correction is needed when having only 2 comparisons, so that the uncorrected p-value can be used in the former case. --> This is correct. However, since we were interested in establishing the U-shaped relationship across three groups, we would like to adhere to the multiple comparison framework. P. 13, top paragraph: mode details should be given regarding the parameters that did not vary across group sizes (i.e., descriptive and inferential statistics). --> The descriptive statistics were added in the text in Tables 1 and 2. A section on parameter recovery is missing. It is important to show the precision in which the parameter values could be recovered using the number of trials and participants used in this study. This can strengthen the claims regarding group-size invariance in some of the parameters. --> This is important suggestion, but I am not sure whether this parameter recovery is feasible or not, given individual parameter estimations were conducted in this study. My understanding is that the parameter recovery could be applied to a few estimated parameters such as hierarchical Bayesian parameter estimation in which the common parameters are estimated across different individuals. In this study, each individual or each group is assumed to have different parameters. So we estimated parameters for each individual and each group. The total number of parameters amounted to 570 (322 individuals, 138 dyads, and 110 triads). For each sample, 2 parameters in the simple Q model, in which case, the total number of parameters is 1140. So, we are not sure how the parameter recovery for these 1140 estimation results could be reported in the paper. Figures 2-4 do not provide much more information than is already given in the text, and hence should be omitted. 14. The bar graphs used to show the data are quite outdated compared to what is acceptable and standard practice these days which includes superimposing the data points on top of the bars and/or showing the distributions using violin plots and similar tools --> These figures were deleted. Responses to Reviewer 2: Equation 1, phi is never explained. What does it represent and what function does it play in the model? It is also listed on pg. 10 as one of the parameters, but nothing is said about its prior. --> ϕ is added here because Q learning tends to generate autocorrelation of the choices, which might bias the estimates of learning parameters, as demonstrated by Katahira (2018). In the text, we added “ϕ is added here as the choice trace to account for autocorrelation of choice, which could affect the learning biases (56).” --> The comments on the previous learning model were correct and we changed the models completely. The two models were much simpler than the previous one, and more standard, used by many related studies. So we hope this time Reviewer 2 did not have any serious concerns about the models. The paper implies that triads do better than individuals, but the difference is not significant, so the authors need to be more careful about how the results are presented. I noticed this in the abstract, but it might say it elsewhere as well --> This is correct and we removed the corresponding sentences. Also in the simple model in Introduction, we showed that Individuals outperform triads if p<.5, but underperform otherwise. Thus, the relationship between the two should and was indeterminate. pg. 8 – the text jumps into explaining the modeling in the section that explains the task. It should probably be its own section --> We created the subsection “Two-armed bandit problem” and explained the game. Then, we moved on to the explanation of the models. top of pg. 4 'put' should be 'puts' pg. 5 "single" maybe should be "signal" pg. 13. first sentence of 2nd paragraph. 'the' should be 'a' --> We corrected all of these. Submitted filename: Revision List.docx Click here for additional data file. 7 Apr 2021 PONE-D-20-35580R1 Three heads are better than two: Comparing learning properties and performances across individuals, dyads, and triads through a computational approach PLOS ONE Dear Dr. Harada, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. I sent your paper back to the two original reviewers; R2 was satisfied with the revisions you made, but R1 still noted some concerns. It seems the concerns center on the lack of attention to detail, as well as emphasizing the novel theoretical advances made by your paper. I invite you to submit a revision, but please pay special attention to the points raised by R1. An overarching concern is that this paper feels as though it was written in a hasty manner, simply to get another publication, and it needs to reach a certain level of quality before it is published. Please be candid about noting the strengths and limitations of your study, so that the conclusions are supported by the data. If you choose to submit a revision, I will evaluate the manuscript and decide whether to ask R1 to review it once again. Please submit your revised manuscript by May 22 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Darrell A. Worthy, Ph.D Academic Editor PLOS ONE [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: No Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The paper has been substantially changed. But improved, I am not sure. 1. In the introduction, we see an attempt at motivating a theoretical justification in pp. 6-7. Frankly, I did not understand any of the notation or its relationship to the study or how it could then be a motivation for the experiment. 2. Two RL models are presented, and applied to the data and their fits are equally good and cannot be differentiated. In addition, the models do not make any different predictions for the experiments either. One is left wondering what the purpose of the exercise is. 3. In the first round, we asked for clarification about which subjects did the 1, 2 or 3-person experiments and how many rounds they did. Here, this request has been addressed but the rebuttal does not really solve any problem. On pp 10-11 (line 172-185) the descriptions are more confusing than helping. For example, in line 184, we are told that 161 individuals participated in triadic experiments but 161 is not divisible by 3. This leaves the reader with the impression that there was no real systematicity to the arrangement of the experimental participation. I am afraid I cannot be positive Reviewer #2: The author addressed all of my previous concerns well. Particularly, the modeling approach used and the way the modeling is explained are both much improved. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 25 Apr 2021 We greatly appreciate reviewers’ helpful comments. This time, Reviewer 2 seemed to accept the revision. So current revision was made in response to Reviewer 1’ s comments. Main changes First, the main revisions in the current manuscript are as follows: -The subsample in the previous manuscript was too strict to severely limit the sample size. So this time, we relaxed the criteria of selecting the subsample to compare across different groups. This time, we selected the sample according to the criteria in which at least one member of the dyadic and triadic groups undertook Test 3 and Test 2, respectively. The results remained the same as before, but this time, the size of this subsample was 262 individuals, 116 dyadic groups and 93 triadic groups, much larger than the previous subsample. - We described our hypothesis more clearly in Introduction, and moved the algebraic model part to Appendix. - In the previous manuscript, we separately examined learning rates of α^±. But this time, we calculated the positivity biases α^+-α^- because many related studies reported positivity biases emerged for many participants. In this manuscript, it was revealed that positivity biases were found in individuals and triadic groups, but not in dyadic groups. This could be one evidence for learning incoherence in dyadic groups. We also examined this positivity biases in within group effects. - The description regarding the number of participants who took tests several times seemed to confuse Reviewer 1. So we rewrote the relevant part to clarify the fact that in each group, some participants undertook tests more than once. Responses to Reviewer 1: 1. “In the introduction, we see an attempt at motivating a theoretical justification in pp. 6-7. Frankly, I did not understand any of the notation or its relationship to the study or how it could then be a motivation for the experiment.” - First, we rewrote this part to make it clearer. The hypothesis was as follows: - Odd numbered group sizes (individuals and triads) generate higher performance due to learning coherence. Even numbered group sizes (dyads) generate lower performance due to learning incoherence. Learning coherence emerged because majority groups could take initiative in decision making over time. In triadic groups, two members out of three could form majority groups and make decision. In individuals, only one member made decision. Learning incoherence was generated due to no such majority subgroups emerged, especially in dyadic groups. In this case, two members agreed or two members did not agree. - The simple algebraic model just described this situation and compared expected rewards in individuals, dyads, and triads. - In the empirical study, first we compared performance across individuals, dyads, and triads. Then, we compared learning properties. We interpreted learning incoherence took place in dyads because the positivity biases did not emerge and the inverse temperature was somewhere between individuals and triads. In triads, they pursue more exploitation strategy. In individuals, they pursue more exploratory strategy. However, dyadic groups took a stuck-in-the-middle strategy, which we interpret as one evidence of learning incoherence. Thus, we believe our hypothesis was supported in the empirical study: Individuals and triads generated higher performance due to learning coherence and dyads generated lower performance due to learning incoherence. - As for the model part, while it was moved to Appendix, we could also drop this part, if reviewers consider it unnecessary. The detailed explanation was provided in the file "Revision List". 2. “Two RL models are presented, and applied to the data and their fits are equally good and cannot be differentiated. In addition, the models do not make any different predictions for the experiments either. One is left wondering what the purpose of the exercise is.” - The two RL models were prepared in response to Reviewer 1’s request for the necessity of model comparison. The result was undifferentiated between the two models. We think two responses are possible. One is to drop either model. Since the positivity biases could not be evaluated in the simple model, this model could be dropped. Another possibility is to maintain two models, and examine whether the learning properties remain the same between the two alternative models. We took this alternative and check the robustness of the results. The results indicated that inverse temperatures accounted for higher performance in both models and we believe they show the robustness of the results. Although reporting the results of the two models seemed more persuasive, if Reviewer 1 prefer to drop either model, we will drop the simple model and report the result of the asymmetric model alone. 3. “In the first round, we asked for clarification about which subjects did the 1, 2 or 3-person experiments and how many rounds they did. Here, this request has been addressed but the rebuttal does not really solve any problem. On pp 10-11 (line 172-185) the descriptions are more confusing than helping. For example, in line 184, we are told that 161 individuals participated in triadic experiments but 161 is not divisible by 3. This leaves the reader with the impression that there was no real systematicity to the arrangement of the experimental participation.” - In this description, we are afraid that some misunderstanding exists. Please note that in the subsample, some individuals undertook tests more than once. Thus, 161 individuals participated in triadic groups, but some individuals participated in these groups more than once. Therefore, the number of individuals (=161) was not necessarily the multiples of three. - However, taking the fact that our description induced misunderstanding, we rewrote this part with more information. Since we changed the subsample this time, we rewrote the corresponding part. - Let us examine the relation across the number of individuals and that of dyadic and triadic groups as follows: Pooled sample # of individuals = 322 # of dyads=138 (230 and 23 individuals took once and twice) # of triads=108 (153, 66, 13 individuals took one, two, and three times) In dyads, the total # of individuals is 276 (=138*2). 230 individuals once # of individuals taking twice = 46 (=23*2) Sum: 230+46=276 In triads, the total # of individuals is 324 (=108*3) 153 individuals once # of individuals taking twice = 132 (=66*2) # of individuals taking three times = 39 (=13*3) Sum: 153+132+39=324 Note that some participants undertook both Tests 2 and 3. We described this information regarding the subsample because it was relevant to group comparison, which used the subsample alone in this manuscript. In the subsample, # of individuals = 262 # of dyads=116 (200 and 16 individuals took once and twice) # of triads=93 (142, 52, 11 individuals took one, two, and three times) In dyads, the total # of individuals is 232 (=116*2). 200 individuals taking once # of individuals taking twice = 32 (=16*2) Sum: 200+32=232 In triads, the total # of individuals is 279 (=93*3) 142 individuals taking once # of individuals taking twice = 104 (=52*2) # of individuals taking three times = 33 (=11*3) Sum: 142+104+33=279 We also provided the information for those taking either or both Tests 2 and 3 in the subsample because this was related to group comparison. The total # of individuals taking Tests 2 and 3=511(=116 dyads*2+93 triads*3) # of individuals taking these tests once = 84 # of individuals taking these tests twice=118 # of individuals taking these test three times = 49 # of individuals taking these test four times=11 Sum: Total # of individuals = 84+118*2+49*3+11*4=511 # of individuals = 84+118+49+11=262 The related information was provided in the following sentences: “All participants in this sample undertook Test 1 once and at least one more experiment in Tests 2 or 3. In Test 2 (dyads), 230 and 23 participants played the TAB once and twice, respectively. In Test 3 (triads), 153, 66 and 13 participants played one, two, and three times, respectively.” ” In total, 262 individuals were in this subsample. In this subsample, 116 dyadic groups and 93 triadic groups were identified. In Tests 2 and 3, the numbers of individuals who took the tests one, two, three, and four times were 84, 118, 49, and 11, respectively. Of the dyads, 216 individuals participated, and 200 and 16 undertook Test 2 once and twice, respectively. Of the triads, 205 individuals participated, and 142, 52 and 11 individuals undertook Test 3 one, two and three times, respectively.” Submitted filename: Revision List.docx Click here for additional data file. 11 May 2021 Three heads are better than two: Comparing learning properties and performances across individuals, dyads, and triads through a computational approach PONE-D-20-35580R2 Dear Dr. Harada, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Darrell A. Worthy, Ph.D Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: 31 May 2021 PONE-D-20-35580R2 Three heads are better than two: Comparing learning properties and performances across individuals, dyads, and triads through a computational approach Dear Dr. Harada: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Darrell A. Worthy Academic Editor PLOS ONE

38 in total

Three heads are better than two: Comparing learning properties and performances across individuals, dyads, and triads through a computational approach.

Introduction

Methods

Participants

Experiment

Two-armed bandit problem

Two-armed bandit problems (TAB).

Q learning model

Asymmetric Q learning model

Estimation method

Results

Group differences

Performance

Learning parameters

Within-group effects

Determinants of performance

Determinants of performance (asymmetric Q learning model).

Model fits

Discussion

1. PsyToolkit: a software package for programming psychological experiments using Linux.

2. Neurobiological correlates of social conformity and independence during mental rotation.

Review 3. Imaging valuation models in human choice.

4. Exchange and cohesion in dyads and triads: A test of Simmel's hypothesis.

5. Striatum-medial prefrontal cortex connectivity predicts developmental changes in reinforcement learning.

Review 6. A neural substrate of prediction and reward.

7. Neural correlates of mentalizing-related computations during strategic interactions in humans.

8. Dissociable effects of dopamine and serotonin on reversal learning.

9. Playing nice: a multi-methodological study on the effects of social conformity on memory.

10. Collective enumeration.

1. Examining learning coherence in group decision-making: triads vs. tetrads.