| Literature DB >> 33813669 |
Luke I Rowe1, John Hattie2, Robert Hester3.
Abstract
Collective intelligence (CI) is said to manifest in a group's domain general mental ability. It can be measured across a battery of group IQ tests and statistically reduced to a latent factor called the "c-factor." Advocates have found the c-factor predicts group performance better than individual IQ. We test this claim by meta-analyzing correlations between the c-factor and nine group performance criterion tasks generated by eight independent samples (N = 857 groups). Results indicated a moderate correlation, r, of .26 (95% CI .10, .40). All but four studies comprising five independent samples (N = 366 groups) failed to control for the intelligence of individual members using individual IQ scores or their statistically reduced equivalent (i.e., the g-factor). A meta-analysis of this subset of studies found the average IQ of the groups' members had little to no correlation with group performance (r = .06, 95% CI -.08, .20). Around 80% of studies did not have enough statistical power to reliably detect correlations between the primary predictor variables and the criterion tasks. Though some of our findings are consistent with claims that a general factor of group performance may exist and relate positively to group performance, limitations suggest alternative explanations cannot be dismissed. We caution against prematurely embracing notions of the c-factor unless it can be independently and robustly replicated and demonstrated to be incrementally valid beyond the g-factor in group performance contexts.Entities:
Keywords: C-factor; Collective intelligence; G-factor; Group performance; IQ
Year: 2021 PMID: 33813669 PMCID: PMC8019454 DOI: 10.1186/s41235-021-00285-2
Source DB: PubMed Journal: Cogn Res Princ Implic ISSN: 2365-7464
Fig. 1Flow Diagram for Study Inclusion/Exclusion. Note. Citations were searched within indexed citations of the original article by Woolley et al. (2010) “Evidence for a Collective Intelligence Factor in the Performance of Human Groups”
Summary of Empirical Studies on Collective Intelligence
| Study name | Effect 1: % Var | Effect 2: (%) Pos. Manifold | Effect 3: c → criterion ( | Effect 4: Av.IQ → c ( | Effect 5: Av.IQ → criterion ( | RA (Y:N) | ||
|---|---|---|---|---|---|---|---|---|
| aCredé and Howardson ( | 5:1 | |||||||
| aBates and Gupta ( | Yes | |||||||
| Woolley et al. ( | 40 | 43.4 | 100 | .52 | .19, p = ns | .18, p = ns | Yes | |
| Woolley et al. ( | 152 | 44.1 | 93 | .28 | .15, p = .04 | .18, p = ns | Yes | |
| Engel, Woolley, Jing, Chabris, and Malone (, | 32 | 49.3 | 100 | Yes | ||||
| Engel, Woolley, Jing, Chabris, and Malone (, | 36 | 41.4 | 100 | Yes | ||||
| Engel et al. ( | 116 | 40 | 100 | .25 | No | |||
| Engel, Woolley, et al. ( | 25 | 40 | 100 | Yes | ||||
| Woolley & Aggarwal (under review); (Also reported in Woolley and Aggarwa | 59 | C1: .29 and C2: .29 | -.05, p = 53 | C1: -.02, p > .05; C2: -.21, p > .05 | No | |||
| Meslec et al. ( | 30 | Yes | ||||||
| Glikson, Harush, et al. (under review) | 115 | .11 | No | |||||
| Chikersal, et al. ( | 58 | No | ||||||
| Kim et al. ( | 248 | 38.38 | -.15 | Yes | ||||
| Aggarwal et al. ( | 98 | 44 | .58e | Yes | ||||
| Barlow and Dennis ( | 86 | 42 | 50 | .07 (p > .05) | Yes | |||
| Barlow ( | 64 | 33 | Yes | |||||
| Barlow ( | 65 | 46 | 100 | .339, p = 026 c | Yes | |||
| Bates and Gupta ( | 26 | 39.8 | 100 | Yes | ||||
| Bates and Gupta ( | 40 | 50 | 100 | Yes | ||||
| Bates and Gupta ( | 40 | 100 | Yes | |||||
| Rowe ( | 29 | 41 | 100 | .104, p = .59 | .294, p = .12 | .202, p = .29 | No | |
| Mean or | 71.5 | 43.03 | 90 | .253 | .185 | .067 | 13:5 |
Note: The table outlines empirical studies on collective intelligence and group performance published between October 2010 and November 2019. n = number of groups. K = number of independent samples. Effect 1 = Percentage of total variance in group IQ composite explained by first factor/component; Effect 2 = The proportion of positive correlations within the correlation matrix comprised of bivariate Pearson's correlations, r, between group IQ test items (positive manifold test); Effect 3 = Bivariate Pearson's correlation, r, between c and a criterion task; Effect 4 = Bivariate Pearson's correlation, r, between Av.IQ and c; Effect 5 = Bivariate Pearson's correlation, r, between Av.IQ and Criterion task; ns = p > .05; RA = Random Allocation to groups; F2F = Face-to-face. C1 and C2 = Criterion task 1 and 2 of a single study; r = Pearson’s correlation coefficient. Mean and Ratio scores include only primary data and therefore exclude previously meta-synthesized results from Credé and Howardson ( 2017a, 2017b), and Bates and Gupta (2017): Studies 2 and 3 (combined)
Pooled data from secondary sources (≥ 2 studies) not included in the present analysis.
b This pertains only to group IQ testing context and not to the group performance setting
cCorrelation exists for the EG only (the c-factor was not apparent in the CG)
dPaper originally added as a conference proceeding (Aggarwal & Woolley, 2014)
eResult was originally reported in R2 value, controlling for intercept and team size, then transformed to a correlation coefficient using square root(.34) = .58 (see Aggarwal et al. 2019, p. 6)
Standardized loadings and average variance extracted across 8 samples
| Group IQ subtest | Woolley et al. ( | Woolley et al. ( | Engel et al. ( | Engel et al. ( | Engel et al. ( | Barlow and Dennis (2014)a | Bates and Gupta | Rowe ( |
|---|---|---|---|---|---|---|---|---|
| Brainstorming | .32 | .58 | .7 | .7 | 1 | 1 | .38 | .57 |
| Matrix Reasoning | .73 | .61 | .72 | .47 | .43 | .74 | .48 | |
| Moral reasoning | .36 | .11 | -.25 | .62 | ||||
| Plan shopping trip | .57 | .23 | .48 | |||||
| Typing | .69 | .48 | .67 | .71 | 0 | .72 | ||
| Word completion (Beginning with) | .75 | |||||||
| Spatial problems | .47 | |||||||
| Incomplete words (Missing letters) | .47 | |||||||
| Estimation problem | .32 | |||||||
| Reproducing art | .34 | |||||||
| Unscramble words | .57 | .57 | .4 | |||||
| Sodoku | .61 | |||||||
| Judgment tasks | .37 | .3 | ||||||
| Memory | .56 | .65 | .26 | .92 | ||||
| Detection | .43 | .33 | .52 | |||||
| Decision | -.14 | |||||||
| Mill Hill vocabulary | .24 | |||||||
| Multiple choice vocabulary (synonyms) | .14 | |||||||
| AVE (%) | 31.32 | 22.26 | 34.87 | 30.88 | 28.05 | 36.07 | 36.50 | 29.58 |
Note. Results display those reported across 8 samples (Bates and Gupta 2017 is in combined form) and indicate the standardized loadings of the c-factor onto the respective subtest. AVE = Average variance extracted based on the statistical average of the squared loadings from each of the subtest results in the samples listed above
aThe standardized loading from the “complex task” was not included in this table because it was used as an external (predictive) validity criterion
bThese standardized loadings are taken from a multilevel structural equation model that combined data from the subtests used across studies 2 and 3 in Bates and Gupta (2017, p.53)
Counts of correlations in specified ranges for item matrices across 12 studies
| Correlation range: | −.20 to | −.10 to | .0 to | 0 to .10 | .10 to .20 | .20 to .30 | .30 to .40 | .40 to .50 | .50 to .60 | .60 to .70 | .70 to .80 | .80 to .90 | .90 to 1.0 | Count (total) | Weight | % Positive |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Woolley et al. ( | 3 | 3 | 2 | 1 | 1 | 10 | 5.85 | 100 | ||||||||
| Woolley et al. ( | 3 | 12 | 13 | 8 | 5 | 4 | 45 | 26.32 | 93.33 | |||||||
| Engel, Woolley, Jing, Chabris, and Malone ( | 1 | 1 | 11 | 4 | 8 | 3 | 28 | 16.37 | 100 | |||||||
| Engel, Woolley, et al. ( | 4 | 3 | 4 | 1 | 2 | 1 | 15 | 8.77 | 100 | |||||||
| Engel, Woolley, et al. ( | 2 | 5 | 6 | 1 | 7 | 21 | 12.28 | 100 | ||||||||
| Barlow and Dennis ( | 1 | 1 | 1 | 2 | 1 | 6 | 3.51 | 50 | ||||||||
| Barlow ( | 1 | 1 | 1 | 3 | 1.75 | 33.33 | ||||||||||
| Barlow ( | 2 | 1 | 3 | 1.75 | 100 | |||||||||||
| Bates and Gupta ( | 2 | 4 | 2 | 1 | 1 | 10 | 5.85 | 100 | ||||||||
| Bates and Gupta ( | 2 | 2 | 1 | 2 | 2 | 1 | 10 | 5.85 | 100 | |||||||
| Bates and Gupta ( | 2 | 5 | 1 | 1 | 1 | 10 | 5.85 | 100 | ||||||||
| Rowe ( | 2 | 3 | 1 | 2 | 1 | 1 | 10 | 5.85 | 100 | |||||||
| Total Count: | 2 | 2 | 4 | 25 | 38 | 38 | 22 | 26 | 9 | 3 | 1 | 1 | 0 | 171 | 100.00 | |
| Percentage (%) | 1.17 | 1.17 | 2.34 | 14.62 | 22.22 | 22.22 | 12.87 | 15.20 | 5.26 | 1.75 | 0.58 | 0.58 | 0.00 |
Note: Numbers represent counts of bivariate correlations within a specified range for a given study. Count (total) displays the total number of correlations per study (rows). Weight (%) refers to the proportion of the total number of correlations (N = 171) analyzed. % Positive calculates the proportion of bivariate correlations that are positive per study (rows). CG = Control group. EG = Experimental group
Fig. 2Histogram of the Frequency (Count) of Distributions of Bivariate Correlations Between Group IQ Items (within study). Note. The frequency of negative and positive correlations between group IQ test items is represented by red and blue bars, respectively. The dotted line –- represents the trendline for the average frequency
External criterion tasks
| Study | Criterion | Description |
|---|---|---|
| Woolley et al. ( | Computerized checkers | A group sat in front of a single screen, were trained for 5 min, and played a single match of checkers against a computerized opponent |
| Woolley et al. ( | Architectural design task | Groups design and build a house, garage, and pool with limited materials and strict building codes. (10 min planning, 20 min building) |
| Enge et al. ( | Student project | Student team projects were completed and rated by university students (peer review) |
| Woolley and Aggarwal (under review); (Also reported in Woolley and Aggarwal | Group learning | Slope (rate and size) of learning gains in 4 × repeated MBA student exams over 6 weeks |
| Woolley and Aggarwal (under review); (Also reported in Woolley and Aggarwal | Group synergy | As above. Synergy slope was measured against coordination and process gains attributable to groups once individual gains are controlled for on the slope |
| Glikson, Harush, et al. (under review); also reported in Woolley, Glikson, Haan, Harush, and Kim, (2018) | Group presentation | Student group PowerPoint presentation worth 40–60% of final subject score (establish new business in foreign country), measured with significant (e.g., semester) delay post group IQ test |
| Kim et al. ( | Group learning | 'League of Legends' video-game team learning behavior (via Edmondson’s scale of error detection / correction); repeated measure T1 = baseline and T2 = 6-month follow-up (learning = T2-T1) |
| Aggarwal et al. ( | Group learning | Group learning defined as “the rate of change (or slope) in earnings for each group across ten rounds of the [minimum-effort tacit coordination] game” (a behavioral economics game, see p.5). Results controlled for team size and intercept |
| Rowe ( | Group decision-making and prioritization task (moon survival) | A hypothetical situation in which a crew stranded on the moon must survive the journey back to their mothership with only 15-items salvaged from the wreck of their explorer craft. Items must be ranked according to their survival utility and compared against experts. (6 min) |
Note: Criterion tasks were always measured external and subsequent to the group IQ test battery; an additional study by Engel et al. (2014ab) used the Desert Survival Task or DST (as reported in study one of Engel et al. 2015ab) as the criterion and reported an outcome of b = .24 and p = .058. The DST asks groups to rank, in order of survival value, a random set of items while stranded in a desert. Results were not included in the meta-analysis because we were not clear about: (a) whether the beta coefficient was (un)standardized; (b) what the predictor variable was (it was assumed to pertain to the regression weight of the c-factor); (c) and was not reported as to whether the score pertained to the online, face-to-face, or combined subset of the sample
Fig. 3Forest plot: The c-factor and Criterion Tasks. Note. Results are displayed for a random effects model pertaining to correlations between the c-factor and various criterion tasks. The c-factor scores are operationalized using factor/principal component regression weights. Most studies did not report which method was used to calculate these weights (e.g., Bartlett method). All studies included in this analysis, except for Rowe (2019), were undertaken by Woolley and colleagues and therefore employed procedures and measures described in Woolley et al. (2010) and/or Engel et al. ( 2014ab). Criterion tasks were different across all studies. Box sizes are relative to sample weights. *Two effects (team learning and team synergy) by Woolley and Aggarwal (under review) from one unique sample (n = 59) were included in this meta-analysis, leading to a total N = 916 when the sample was added on the basis of each unique effect or a total N = 857 when the sample was added on the basis of each unique group
Fig. 4Funnel plot: Publication Bias and the c-factor. Note. The funnel plot estimates the number of missing studies, assuming they retain the null (e.g., demonstrate zero correlation), required to reduce the Z-value below a .05 alpha (two-tailed) cutoff of 1.96. In this instance, the presence of empty dots (i.e., actual studies) and the absence of black-filled dots (i.e., imputed studies) suggest publication bias has not been detected using this test
Fig. 5Forest plot: Average IQ and Criterion Task Performance. Note. Results are displayed for a meta-analysis using a random effects model for correlations between average IQ scores and group performance on criterion tasks. The average IQ is operationalized using the total IQ scores of each individual member divided by the number of members (= ∑IQ / n). The study by Barlow and Dennis (2016) did not apply the c-factor to the criterion task (the profit maximization task) because the authors found it was not valid. Studies behind the other (five) correlations, however, did apply both the c-factor and Av.IQ to the same criterion task—allowing for relative comparison. The Wonderlic Personnel Test (WPT) was used for all except Woolley et al. (2010) Study 1 where 18 odd items of 36-item Raven’s Advanced Progressive Matrices were used and in Rowe (2019) where the ICAR-16 was used. Box sizes are relative to sample size weights. *Two effects (team learning and team synergy) by Woolley and Aggarwal (under review) from one unique sample (n = 59) were included in this meta-analysis, leading to a total N = 425 when the sample was added on the basis of each unique effect or a total N = 366 when the sample was added on the basis of each unique group
Fig. 6Funnel plot: Publication Bias and Average IQ. Note. The funnel plot demonstrates standard error estimates from actual and imputed (e.g., missing) studies; the former is indicated by white and the latter is indicated by black dots. The black-filled dot to the left of the axis suggests a study that shows a negative correlation between average member IQ and the criterion task could be missing from this review (e.g., file-drawered)
The relationship between statistical power and observed and expected effect sizes
| Power (%) by Effect Size | ||||||
|---|---|---|---|---|---|---|
| Study | Sample size (groups) | Small | Moderate | Large | c-factor | Mean IQb |
| Woolley et al. ( | 40 | 9.4 | 47.7 | 92.0 | 37 | 41.9 |
| Woolley et al. ( | 152 | 23.3 | 96.6 | 100.0 | 90.1 | 93.8 |
| Engel et al. ( | 32 | 8.4 | 39.2 | 85.3 | 30.3 | 34.3 |
| Engel et al. ( | 36 | 8.9 | 43.6 | 89.3 | 33.7 | 38.14 |
| Engel et al. ( | 116 | 18.8 | 91.1 | 100.0 | 80.8 | 86.2 |
| Engel et al. ( | 25 | 7.6 | 31.3 | 74.9 | 24.2 | 27.4 |
| Woolley and Aggarwal (under review); (Also reported in Woolley and Aggarwal | 59 | 11.7 | 64.6 | 98.5 | 51.5 | 57.7 |
| Meslec, et al. ( | 30 | 8.2 | 37.0 | 82.8 | 28.6 | 32.3 |
| Glikson, Harush, et al. (under review) | 115 | 18.6 | 90.8 | 100.0 | 80.4 | 85.9 |
| Chikersal, et al. ( | 58 | 11.6 | 63.9 | 98.4 | 50.8 | 56.9 |
| Kim et al. ( | 248 | 35.0 | 99.8 | 100.0 | 98.6 | 99.4 |
| Aggarwal et al. ( | 98 | 16.5 | 85.8 | 100.0 | 73.8 | 79.9 |
| Barlow and Dennis ( | 86 | 15.0 | 81.0 | 99.9 | 68 | 74.4 |
| Barlow ( | 64 | 12.3 | 68.3 | 99.0 | 54.9 | 61.3 |
| Barlow ( | 65 | 12.4 | 69.0 | 99.2 | 55.6 | 62 |
| Bates and Gupta ( | 26 | 7.7 | 32.5 | 76.7 | 25.1 | 28.4 |
| Bates and Gupta ( | 40 | 9.4 | 47.7 | 92.3 | 37 | 41.9 |
| Bates and Gupta ( | 40 | 9.4 | 47.7 | 92.3 | 37 | 41.9 |
| Rowe (unpublished doctoral thesis) | 29 | 8.1 | 35.9 | 81.4 | 27.7 | 6.4 |
| Proportion of studies with acceptable (≥ 80%) power: | 0 of 19 (0%) | 6 of 19 (31.2%) | 17 of 19 (89%) | 4 of 19 (21.1%) | 4 of 19 (21.1%) | |
Note: Calculations of statistical power are based on the actual sample size for included studies. All power calculations are written as percentage terms (%); Categories for the magnitude of association are based on the conventions of Cohen (1988); calculations are made using G*Power 3.1 software; correlations are for bivariate normal models (Pearson's r) and computed post hoc based on alpha error probability of .05 (two-tailed) and power of 80% (1—β = .80); power calculated based on tests against a null model (r ~ 0). Sample size is based on the actual number of groups included in the study and/or condition
aThis value is based on the sample weighted correlation derived from the meta-analysis reported in Fig. 3
bThis value is based on a sample weighted correlation derived from three meta-analyses investigating the relationship between average (or sum) IQ scores and group performance (Bell 2007; Devine and Philips 2001; Stewart 2006)