Literature DB >> 29152591

The idiosyncratic nature of confidence.

Joaquin Navajas^1,2, Chandni Hindocha^3,4, Hebah Foda³, Mehdi Keramati⁵, Peter E Latham⁵, Bahador Bahrami³.

Abstract

Confidence is the 'feeling of knowing' that accompanies decision making. Bayesian theory proposes that confidence is a function solely of the perceived probability of being correct. Empirical research has suggested, however, that different individuals may perform different computations to estimate confidence from uncertain evidence. To test this hypothesis, we collected confidence reports in a task where subjects made categorical decisions about the mean of a sequence. We found that for most individuals, confidence did indeed reflect the perceived probability of being correct. However, in approximately half of them, confidence also reflected a different probabilistic quantity: the perceived uncertainty in the estimated variable. We found that the contribution of both quantities was stable over weeks. We also observed that the influence of the perceived probability of being correct was stable across two tasks, one perceptual and one cognitive. Overall, our findings provide a computational interpretation of individual differences in human confidence.

Entities: Chemical

Year: 2017 PMID： 29152591 PMCID： PMC5687567 DOI： 10.1038/s41562-017-0215-1

Source DB: PubMed Journal: Nat Hum Behav ISSN： 2397-3374

Introduction

Understanding the computational basis of individual differences in human cognition has fundamental implications for medical and biological sciences, as well as for economics and the social sciences. A prime example is confidence, which plays a key role in a wide range of aspects in life, including learning to make better decisions1, monitoring our actions2, cooperating effectively with others3, 4, and displaying good political judgment5. One of the most intriguing features of confidence is that humans tend to communicate this feeling in a largely idiosyncratic way: although confidence reports are typically stable within each person, they tend to be variable across the population6, 7. For instance, different individuals performing the same task generate distributions of confidence ratings with different mean and shape7. In addition, the correlation between confidence and objective performance varies for different people, and is related to individual variations in brain structure8 and connectivity9, 10. While a vast literature has focused on the biological correlates of individual differences in human confidence8–10, the computational roots of this phenomenon remain unclear. Previous research in sensory psychophysics8, 11 and value-based decision making10, assumed that confidence is a function solely of the perceived probability of being correct. This assumption is reasonable: confidence should reflect only this subjective probability12–14. Driven by this normative framework, previous studies explained differences among people as measurement noise15, or as individual differences in the ability to report the probability of being correct8, 9. This may have been an oversimplification: there is extensive literature showing that confidence is influenced by factors other than the probability of being correct16, such as the reliability of sensory stimuli2, 13, the magnitude of sensory data11, post-decisional biases17, and even personality traits7. Here we set out to determine what probabilistic quantities, besides perceived probability of being correct, contribute to individual differences in human confidence. We focused on a categorical task, in which subjects had to decide whether the mean of a set of items was above or below a decision boundary, and then report their confidence. For about half of the subjects, confidence did depend solely on the perceived probability that they were correct. However, for the other half, confidence also depended on a different statistical quantity: their uncertainty in the estimate of the mean18, 19. Moreover, the dependence of confidence on the perceived probability of being correct and uncertainty was stable across experiments performed weeks apart. Finally, the dependence of confidence on the perceived probability of being correct was stable across tasks involving uncertainty in the perceptual and cognitive domain, but the dependence on the perceived uncertainty was not. This is consistent with the predictions of a recent theoretical account arguing that uncertainty is encoded by domain-specific neural populations14. Overall, these findings provide a computational interpretation of individual differences in the human sense of confidence.

Results

In a perceptual task (Experiment 1), participants observed a sequence of 30 tilted Gabor patches presented at the fovea in rapid (4 Hz) serial visual presentation (Fig.1a). At the end of the sequence, participants decided whether the mean orientation of the patches was clockwise or counter-clockwise relative to vertical. Participants then reported how confident they were in their decision on a scale from 1 to 6. To manipulate uncertainty, we pseudo-randomly drew the orientation samples from uniform distributions with exactly the same mean (+3 degrees or -3 degrees) but different variances on different trials (Fig. 1b). Participants performed better as variance decreased (Fig. 1c, one-way repeated measures ANOVA, F(3,29)=231.4, p<10-10).

Figure 1

Tracking mean evidence in rapid serial visual presentations. (a) 30 tilted Gabor patches were serially flashed at the fovea, updated at 4 Hz. Participants made a binary decision about whether the mean in the sequence was tilted to the right or left, followed by a confidence rating. Full details of the task are available in Online Methods. (b) The samples were drawn from a uniform distribution with mean, m, set to either exactly +3 degrees or exactly -3 degrees. The dashed line shows m=+3. The endpoints of the uniform distributions were m±v, with v = 10, 14, 24, or 45 degrees, yielding four conditions with four different variances. (c) Performance increased with decreasing variance. Dots show the average performance across subjects, and vertical lines depict the s.e.m. The solid black curve shows the best fit of the stochastic updating model (Equations [1] and [2]). (d) Confidence reports averaged over all subjects. Vertical lines show s.e.m. At the population level, confidence in incorrect trials remains approximately constant as a function of variance.

To fit the choices of each participant, we assumed that they keep track of the mean orientation, which they update after each stimulus presentation. To update their estimate of the mean within each trial, we considered a model in which participants combine a noisy estimate of the current sample with their previous estimate of the mean, where μ is the estimate of the mean after i samples (μ0 = 0), 0 < λ < 1), determines the relative weighting of recent versus more distant samples, θ is the actual orientation of the i sample in the sequence, ξ is sampled from the standard normal distribution, and γ is a free parameter indicating the strength of the noise. The multiplicative nature of the noise ensures that the uncertainty in the update of the estimate scales with the size of the observed sample, θ. At the end of the sequence, choice is determined by the sign of the final value of the mean (μ30): the agent chooses clockwise if μ30 is positive, and counter-clockwise if μ30 is negative. This model explains two important quantitative patterns observed in our behavioural data. First, all items in the sequence had a significant influence on choice (regression weights against zero, t(29)>3.17, p<0.003 for all items), but later samples had more influence than earlier ones (slope of regression weights against zero, t(29)=4.70, p=10-6). This recency effect was modulated by the learning rate λ (Supplementary Fig. 1). Second, we observed that items in high variance sequences had smaller influence on choice (F(3,29)=57.8, p~0) indicating larger integration noise in these trials. The last term in Equation [1], modulated by γ, captures this pattern (Supplementary Fig. 2). We also tested an alternative model that tracks the mean of the sequence in a deterministic way, and then makes stochastic decisions. This model, however, failed to explain the trend in Fig. 1c, which shows that performance increases as variance decreases (see Supplementary Fig. 3 for details and model comparison).

Computation of confidence

In this task, confidence should reflect the perceived probability of being correct, for which participants need to have an estimate of the variance of μ30. We assumed that they are able to compute the true variance associated with Equation [1] (although our findings do not require this assumption, see Supplementary Notes). Thus, perceived variance, denoted is given by The model described by Equations [1] and [2], which we call the stochastic updating model, is illustrated in Fig. 2a. Given μ30 and subjects can compute, on each trial, the perceived probability of being correct, p̂(correct)(shaded area under the Gaussian distribution in Fig. 2a).

Figure 2

Estimating confidence. (a) Each trial consists of 30 presentations of tilted Gabor patches. At each presentation (θ) the mean (μ) is updated by combining the estimate on the previous sample with a noisy version of the current Gabor patch. The black line represents one realisation of the model. At the end of the sequence, the subject makes a decision based on the sign of μ30. The subjective probability of being correct and the observed Fisher information are then computed according to the equations shown in the right panel; see Online Methods for full details. (b) The perceived probability of being correct, p̂, averaged over variance condition for correct trials (solid grey line) and incorrect trials (dashed black line), and also averaged across participants. For correct trials, this quantity increases with decreasing variance (solid grey line); for incorrect trials it shows the opposite pattern (dashed black line, see ref.15 for more details). (c) The uncertainty in the estimate of μ30, quantified by the observed Fisher information, increases both for correct and incorrect trials (same markers as panel b).

Using this model, we estimated the expected values of p̂(correct) for different variance conditions (see Methods, Equation [9], and Fig. 2b). When we separated by correct and incorrect trials, we observed a pattern that has been suggested based on normative arguments15, 20: confidence on correct trials should increase as the variance decreases, whereas confidence on error trials should show the opposite effect, and decrease as the variance decreases. We did not, however, observe this pattern in our data, at least not on average: as shown in Fig. 1d, confidence on correct trials did indeed increase as variance dropped, but on error trials confidence was relatively independent of variance (F(3,29) = 0.57, p = 0.63). This last observation indicates that, again on average, subjects were misestimating confidence: they should have been less confident on low-variance error trials than in high-variance error trials, as their probability of being correct was lower (dashed curve in Fig. 2b). This suggests that subjects partially based their confidence on the uncertainty in the value of the mean orientation – a reasonable, if suboptimal, heuristic. Under this heuristic, low-variance trials would raise their confidence relative to high-variance ones. An appropriate weighting of perceived probability of being correct, shown in Fig. 2b, and a function of uncertainty such as the observed Fisher information (the inverse of ), shown in Fig. 2c, could, therefore, explain the confidence ratings observed in Fig. 1d. To formally test this proposal, we compared the normative model of confidence based on only p̂(correct) with 7 alternative models based on different linear combinations of p̂(correct), mean, standard deviation, variance and Fisher information (Supplementary Figure 4). We evaluated which combination provided a better fit to confidence ratings using ordinal logistic regressions (see Methods). The normative model based on just p̂(correct) had one parameter per subject, whereas the alternative models had two parameters for each participant. Our data supported extending the normative model by adding a second parameter, uncertainty in the estimated mean, quantified by either standard deviation, variance or Fisher information (Wilcoxon sign-rank test for deviance: z = 4.78, p = 10-6 for standard deviation; z = 4.73, p = 10-6; for variance; z = 4.73, p = 10-6 for Fisher information). These three models were statistically indistinguishable from each other (z < 1.7, p > 0.1 for all pairwise comparisons, see Supplementary Fig. 4 for more details). This analysis indicates that uncertainty in orientation does indeed influence confidence. To analyse this finding in more detail – and in particular to quantitatively examine inter-subject differences – we need to choose a particular function of uncertainty. Because standard deviation, variance and Fisher information are related by invertible transformations, it is fundamentally impossible to determine which function is used by the brain (see Supplementary Notes). Instead, we ask which quantity is the best linear predictor of confidence in an ordinal regression model. To do that, we conducted a separate experiment in which the perceived probability of being correct played no role. We asked participants to estimate the average orientation in the sequence of Gabor patches and to rate their confidence (see Control Experiment in Methods). This experiment was very similar to Experiment 1: on each trial the angles of the Gabor patches were drawn from uniform distributions with one of four different variances (the same used in Experiment 1). However, rather than just two possible means, the mean was randomly chosen from a uniform distribution over the whole range of orientations. Consequently, participants did not make a categorical decision, as in the previous experiment; instead, they estimated the value of the mean. Therefore, their reported confidence was not about the probability that they were correct, but about their uncertainty in the estimate of the mean. As the variance in the sequence decreased, responses were more accurate (F(3,9) = 13.21, p = 10-5) and more confident (F(3,9) = 37.4, p = 10-9, see Supplementary Fig. 5). We regressed confidence against single-trial estimates of either Fisher information, variance or standard deviation. These fits were significantly better when using Fisher information rather than variance (Wilcoxon sign-rank test for difference in log-likelihood, z=2.8, p=0.005) or standard deviation (z=2.9, p=0.004). These results suggest that it is reasonable to use Fisher information to quantify uncertainty. (For additional details, see Methods and Supplementary Figure 5).

Individual differences and their stability over time

The analysis presented so far is based on population-averaged data (Fig. 1d), so it is uninformative about differences among individuals. To determine whether, and how, p̂(correct) and Fisher information influence confidence within subjects, we looked at the data of each individual. As expected6, 7, we observed substantial inter-individual differences (Fig. 3). Some subjects did indeed base confidence solely on p̂(correct). However, in approximately half of them, confidence appeared to be influenced – at least to some degree – by Fisher information. To quantify this, we regressed21 confidence reports against model-based estimates of p̂(correct) and information. Fig. 3 shows a scatter plot of the regression weights for p̂(correct) and Fisher information. In 13 out of the 30 participants, confidence significantly reflected p̂(correct) but not information. In 14 other participants, however, confidence significantly reflected both p̂(correct) and information. One participant’s confidence conveyed only information but not p̂(correct), and finally, for two participants, confidence did not reflect either of the two quantities.

Figure 3

Analysis of confidence across individuals. The main panel in the lower left shows regression weights on confidence for different individuals. x-axis: weight of the probability of being correct (β); y-axis: weight of information (β). Each dot is a different participant, and the colour codes for significance (at the 0.05 level) as follows: dark green, only β was significant; light green, both β and β were significant; yellow, only β was significant; grey, neither was significant. Insets along the top and right margins show average confidence and confidence distributions for four representative participants. Left plots: mean confidence across different variance conditions, split by correct (solid grey line) and incorrect (dashed black line) trials. Right plots: probability distribution over confidence. For participant #19 (yellow dot), confidence reflected only information: confidence increased with variance for incorrect trials. For participant #16 (dark green dot), confidence reflected only the perceived probability of being correct: confidence in error trials decreased with increasing variance. For participant #27 (light green dot), confidence reflected a mixture of both computations. For participant #24 (grey dot), confidence was not modulated by either of these quantities. Note that there are large differences in confidence distributions, with subjects #24 and #27 showing far more confidence than subjects #16 and #19. Because α3 is the fraction of trials with confidence larger than 3, that quantity is larger for subjects #24 and #27 than for subjects #16 and #19.

The ordinal regression identified seven parameters for each individual (see Methods, Equation [10]): a weight for p̂(correct), denoted β; a weight for information, denoted β; and five parameters α (j = 1, … ,5). The latter are the average log odds of observing a confidence rating greater than j; from these we selected the mid-value, α3, which is based on splitting the confidence scale in halves. The parameter α3 was correlated with the average confidence across the entire experiment (r=0.84, p<10-8), and so indicates how under- or overconfident a given participant is; we thus refer to α3 as the overall confidence. We confirmed that individual differences in these parameters (β,β, and α3) are not simply explained by how well our model fit decisions (see Supplementary Notes). The three selected variables were uncorrelated with each other across the population (r<0.35, p>0.1 for all pairwise comparisons between β, β, and α3). Finally, we note that while subjects were required to report confidence, they did not explicitly use it to, for example, regulate learning1 or make collective decisions3. Thus, we know only that β and β link perceived probability of being correct and Fisher information to confidence reports, which could in principle differ from internal computations of confidence11. To explore this issue, we regressed reaction time against perceived probability of being correct and Fisher information, as previous studies have shown that reaction time correlates with the computation of confidence22, 23. The regression coefficients based on reaction time were highly correlated with β and β (Supplementary Fig. 6), suggesting that confidence ratings reflected the computation of confidence. This analysis would be no more than a model-fitting exercise if a different profile – that is, a different relationship between confidence, p̂(correct), and Fisher information – emerged when the same participants were retested. To test for stability, in Experiment 2 we retested 14 of the participants from Experiment 1 approximately one month later. We observed that the three variables (β,β,α3) were correlated across experiments (Fig. 4), indicating that this decomposition is stable across time and informative of the identity of the participants. To further validate this observation, we found that the distance in the 3-dimensional space defined by (β,β,α3) within participants (across the two experiments) was smaller than the distance between different participants within an experiment (Wilcoxon rank sum test, z=4.0, p<10-4). This shows that our computational model of confidence is stable across different experimental sessions (see Discussion for comparison with previous studies).

Figure 4

Stability across time. 14 participants of Experiment 1 were retested approximately one month later (35.2±2.4 days; range = 23-49 days). We probed stability by asking how much our three parameters (β, β and α) changed across experiments. (a-c) Correlation across experiments for β (a), β (b), and α (c). Each square is a different participant, the dotted line is the identity, and the value of r given in each box is the Pearson correlation coefficient. The three variables were significantly correlated across experiments, suggesting that this decomposition is stable across time. A non-parametric method to measure rank correlation across experiments yielded similar results (Spearman’s rank correlation, r=0.82, p<0.001 for β, r=0.54, p<0.05 for β, and r=0.55, p<0.05 for α). A robust regression that underweights potential outliers further supported these findings (β: regression coefficient 0.59±0.14, p=0.001; β: regression coefficient 0.74±0.27, p=0.02; α regression coefficient 0.60±0.18, p=0.005).

Consistency across tasks

To determine whether subjects compute confidence the same way across tasks – that is, whether they give the same weight to p̂(corrrect) and Fisher information, and have the same overall confidence – we repeated our experiments on a cognitive task: averaging a sequence of numbers. In Experiment 3, a new group of 20 participants performed, in counterbalanced order, the visual task described above and a numerical averaging task (Fig. 5). In the numerical task, we presented two-digit numbers, updated at the same rate as in Experiment 1 (4 Hz). The task was to decide whether the mean of the sequence was greater or smaller than 50. Uncertainty was manipulated in the same way as in Experiment 1, using a set of variances that ensured comparable performance across tasks (see Methods).

Figure 5

Decisions and confidence in Experiment 3 (N=20). (a,c): Visual task (replication of Experiment 1 with different participants; panel a corresponds to Fig. 1c and panel c to Fig. 1d). (b) Same as (a), but for the numerical task. (d) Same as (c), but for the numerical task. The similarity between panels a and b, and between panels c and d, indicate that, at least on average, the visual and numerical tasks lead to remarkably similar behaviour, despite the fact that one is perceptual and the other is cognitive.

In both tasks, accuracy increased with decreasing variance (Fig. 5a,b). A two-way repeated measures ANOVA with factors “variance” and “task” showed a significant main effect of variance (F(3,19)=194.3, p<10-10) but a non-significant effect of task (F(1,19)=2.5, p=0.13) or interaction (F(3,19)=0.84, p=0.47). Importantly, replicating Experiment 1, variance did not modulate confidence in error decisions (F(3,19)=0.2, p=0.89 for the visual task; F(3,19)=1.1, p=0.4 for the numerical task). Confidence in the visual task was not statistically different from confidence in the numerical task (F(1,19)=1.58, p=0.22, Fig. 5c,d). As in the visual task, later numbers had more influence on choice than earlier numbers (F(5,19)=18.0, p=10-12) (Supplementary Fig. 1), and numbers in the high variance condition had a smaller influence on choice than number in the low variance condition (F(3,19)=19.4, p=10-9) (Supplementary Fig. 2). We therefore used the same stochastic updating model (Equations [1] and [2]) to fit the data in Experiment 3. Also consistent with the visual task, decisions were better fit by this model than the alternative model we considered in the visual task (log-likelihood of the difference against zero: t(19)=5.2, p<10-4 for the cognitive task; t(19)=6.4, p<10-5 for the perceptual task). We regressed confidence against p̂(correct) and Fisher information, and, as in Experiment 1, about half the subjects based confidence solely on p̂(correct), and about half also took into account Fisher information (see Supplementary Figs. 7 and 8). We also provided independent evidence that, in the numerical task, Fisher information was more linearly predictive of confidence reports than other functions of variance (Supplementary Fig. 5). We asked if our three regressors (β,β and α3) were consistent across the numerical and visual tasks. The within-participants distance in the 3-dimensional space was smaller than the between-participants distance (Wilcoxon rank sum test, z=3.3, p<0.001), suggesting that they were – at least in aggregate. And indeed, the weight of perceived probability of being correct, β, and the overall confidence, α3, were significantly correlated across tasks (r=0.74, p<0.001 and r=0.63, p<0.01, respectively). However, the weight of Fisher information, β, was uncorrelated across tasks (r=0.20, p=0.37), indicating that Fisher information has quantitatively different effects on confidence in visual and numerical tasks (Fig. 6). This result is in agreement with a recent theoretical account arguing that the inverse variance is represented by domain-specific neural populations14 (see Discusion).

Figure 6

Consistency across tasks involving uncertainty in the perceptual and cognitive domain. 20 participants that were not tested in Experiments 1 or 2 performed one visual and one numerical task (Experiment 3). As in Fig. 3, we decomposed confidence in terms of the weight of p̂(correct), (β), the weight of information (β), and the overall confidence (α). (a-c) Correlation across tasks for β (a), β (b), and α (c). Each square is a different participant, the dotted line is the identity, and the value of r given in each box indicates the Pearson correlation coefficient. β and α were positively correlated across tasks; however, the weights of Fisher information, β, were uncorrelated across tasks. A non-parametric method to measure the correlation across experiments yielded similar results (r=0.68, p<0.01 for β, r=0.22, p=0.35 for β, and r=0.62, p<0.01 for α).

Discussion

The computations underlying confidence have attracted considerable attention over the last several years, in part due to recent developments in model-based approaches12–14 combined with neurophysiological recordings in non-human animals24–26 and neuroimaging in humans8–10, 27. The standard approach consists of fitting a model to the entire population and treating inter-individual variability as noise11, 15. However, if such individual differences are robust over time, and consistent across tasks7, then treating them as noise limits our understanding of the computational processes underlying confidence. Here we found that inter-individual differences in confidence ratings are meaningful in terms of their underlying computations. In particular, we found that different individuals used different weightings for two probabilistic quantities: their perceived probability of being correct, and their uncertainty in their estimate of the task-relevant variable14, the latter quantified by the observed Fisher information18, 19. We isolated the contribution of each of these two quantities to confidence, and measured, for each individual: 1) the influence of the perceived probability of being correct on confidence (β), 2) the influence of Fisher information on confidence (β), and 3) the participants’ overall confidence (α3). All three variables were stable across several weeks (Fig. 4), and two of them (β and α3) were stable across different tasks – one in the perceptual domain; the other in the cognitive domain (Fig. 6). Normative theories of decision-making postulate that confidence should depend solely on the probability of being correct12–14. We speculate that the perceived uncertainty about task-relevant variables could serve as a mental shortcut – a convenient heuristic – that provides a proxy for the probability of being correct28. This shortcut is reasonable, as uncertainty correlates with decision performance in our experiments (Figs. 1c and 2c). Previous research in our group showed that confidence can reflect the magnitude of sensory data11, a choice-independent quantity that also correlates with behavioural performance. Our finding that a heuristic computation modulates confidence judgements about categorical decisions is in line with this study. Our model of confidence assumes that subjects linearly combine the normative computation of p̂(correct) with a function of variance. However, we cannot rule out the possibility that subjects compute p̂(correct) suboptimally – for example, by partially basing it on the uncertainty in the task relevant variable – and then computing confidence based solely on their suboptimal estimate of p̂(correct). While further experiments are needed to disentangle these alternatives, we consider the former explanation to be more likely than the latter. Indeed, many studies suggest that confidence is a multivariate function that depends on factors such as the structure of the task11, the social context29, and post-decisional biases17. Previous research has shown reliable individual differences in the mean and shape of the distribution of confidence ratings6, 7, and in the extent to which confidence predicts behavioural accuracy7, 8. These properties are believed to be idiosyncratic and correlate with individual variations in personality trait7, brain structure8, and resting-state functional connectivity9. For example, individual differences in the correlation between confidence and accuracy were systematically linked to a frontal network including the anterior prefrontal cortex, ventro-medial prefrontal cortex, and rostro-lateral prefrontal cortex8, 10, 30, 31. These findings were based on decisions in a wide range of contexts, including visual8 and value-based10 judgments. Although these studies provided interesting insights about the brain regions that correlate with individual differences in confidence, none of them explicitly asked what probabilistic quantities influence this variability. Here, we provide empirical evidence that the idiosyncratic nature of confidence is due to differences in the computation of confidence; more specifically, different individuals place different weighting on the perceived probability of being correct and the perceived uncertainty in the estimate of the task-relevant variable. In principle, we could have used any function of variance to quantify uncertainty, and indeed all tested functions provide equally good fits in our categorical task (see Supplementary Fig. 4). We chose to model the influence of uncertainty as linear changes in Fisher information (inverse variance) only because it provided the best linear fits to confidence in a separate experiment (see Supplementary Fig. 5). The idea that the inverse variance could modulate confidence has been previously proposed and tested in several studies1, 2, 17, 32, 33. In ref. 32, subjects judged the mean orientation of a set of lines, and found that confidence reports underweighted the stimulus variance32. However, whether the model parameters of that study were stable over time or consistent across domains remains unknown. In ref. 33, participants observed random-dot motion in two conditions: one with low mean and low variance, and the other one with high mean and high variance33. Although performance was the same for both conditions, some participants gave higher confidence ratings in one condition or the other. A model in which different subjects gave different weights to signal-to-noise ratio and inverse variance fit these data but, critically, the fit was unstable over time (the weight of signal-to-noise ratio was uncorrelated across a test-retest). In principle, this is at odds with our finding that the weight of p̂(correct) was stable over time. However, we should emphasise that the signal-to-noise ratio is different than p̂(correct): while the signal-to-noise ratio is an objective quantity that depends only on stimulus properties, p̂(correct) is a subjective quantity that depends on the decision and how the subject learned about the stimulus (see Equations [5] to [9] in Methods). Here, instead of fitting confidence against physical properties of the stimuli, we focused on a normative theory based on the perceived (rather than the actual) probability of being correct, and explained individual differences in confidence as systematic deviations from this theory. This decomposition fit our data better than a linear combination of the stimulus mean and variance (Supplementary Fig. 4). Our work thus provides a robust model of individual differences in confidence, with all parameters stable over time (Fig. 4). Finally, we evaluated the reliability of this computational model of confidence across domains, which suggested a relationship between specific model components and their neural encoding. An implication of our behavioural findings is that neurons representing confidence should receive input both from populations encoding the perceived probability of being correct and from populations encoding uncertainty. Because of differences in connectivity (which are likely to arise during learning and development) different individuals should have different weightings for these two quantities; that is, different values of β and β. That is exactly what we found (Fig. 3). Furthermore, if connectivity changes slowly – a reasonable assumption in the absence of learning – β and β would be stable over time. Again, that is exactly what we found (Fig. 4). This does not, however, explain the fact that β is invariant across tasks whereas β is not (Figs. 6a, b). For that, we need to consider how p̂(correct) and uncertainty are encoded. Because the probability of being correct is a dimensionless quantity, and is universal across different sources of uncertainty, it is reasonable to assume that it is encoded by a domain-general circuitry – for instance, by neurons in the prefrontal cortex8, 10, 30, 31. In contrast, uncertainty – whether it is Fisher information, variance or standard deviation (see Supplementary Notes) – is a quantity with dimension, and so is likely to be encoded by domain-specific populations14. For example, in the case of the visual task, uncertainty could be represented by neurons in primary visual cortex that are tuned to orientation34; and indeed, sensory uncertainty can be decoded from activity in the visual cortex35. In the same manner, numerical uncertainty could be represented by neurons in the parietal cortex tuned to different numerical quantities36, although this has not yet been tested. Under the assumption that the perceived probability of being correct is encoded by domain-invariant populations, the influence of this quantity on confidence should be stable across domains. This would explain our results in Fig. 6a: β was correlated across the visual and numerical tasks. Likewise, under the assumption that uncertainty is encoded by domain-invariant populations, the influence of this quantity on confidence should vary across domains. This would explain our results in Fig. 6b: β was not correlated across the visual and numerical tasks. These are, of course, hypotheses. They do, though, make testable predictions. First, neural circuits encoding confidence should show different functional connectivity with those encoding visual versus numerical uncertainty. Second, different participants should have different relative strength of these two forms of connectivity, co-varying with their behavioural differences. Future experiments combining behavioural data, computational modelling, and neural recordings could test these predictions. The value of investigating individual differences in human behaviour and cognition was first recognised in the psychological sciences, with a special interest in high-level aspects such as intelligence37 and personality38. More recently, technical advances in magnetic resonance imaging have made it possible to develop a cognitive neuroscience of individual differences39, 40. Findings include neural correlates of individual differences in motor behaviour41, visual perception42, mood43, social network size44, and confidence8–10. While these studies provide valuable insights into the neural basis of inter-individual differences in human cognition, the mechanisms responsible for such differences remain unknown. To overcome this limitation, the next challenge is to build a computational neuroscience of individual differences. A first step in this direction is to understand the computations performed by healthy adults leading to inter-individual variability in behaviour. Our study provides a computational model of consistent individual differences in confidence, paving the way towards determining how these computations change under development45, aging46, and psychiatric disorders47.

Methods

Participants

60 healthy adults (aged 18-45, 43 right-handed, 31 female) with normal or corrected-to-normal vision participated in this study. All participants were recruited through advertisement at University College London, and gave written informed consent. We collected data from 94 experimental sessions lasting approximately 90 minutes each. Participants were paid £10 per hour. All experimental procedures were approved by the research ethics committee at University College London.

Display

Stimuli were generated using the Cogent Toolbox (http://www.vislab.ucl.ac.uk/cogent.php) for MATLAB (Mathworks Inc). Participants observed an LCD display (21-inches monitor; refresh rate: 60 Hz; resolution: 1024 × 768 pixels) at a viewing distance of approximately 60 cm.

Experiment 1: Visual task

30 participants performed Experiment 1, which consisted of an orientation averaging task (Fig. 1). Observers viewed a sequence of 30 tilted Gabor patches over a middle grey background (standard deviation of the Gaussian envelope: 0.63 deg; spatial frequency: 1.57 cycles deg-1; contrast: 25%) flashed in rapid succession at the centre of the screen. Each patch was presented for 200 ms with an inter-stimulus interval of 50 ms, resulting in an update rate of 4 Hz. Once the sequence finished, participants were asked to judge whether the mean orientation of the patches was tilted clockwise or counter-clockwise relative to the vertical. The response alternatives consisted of two tilted lines presented in the left and right visual field (size: 2.2 deg, location: 11.3 deg left or right to the centre of the screen). The position of the response alternatives was randomly assigned and counter-balanced across trials. To select the option displayed in the left, participants pressed the ‘Q’ button of a QWERTY keyboard using the left hand; to select the option on the right, they pressed the ‘P’ button. Participants were then asked to report their confidence on a rating scale from 1 to 6. A horizontal line was presented at the centre of the screen (length: 18.9 deg) with 6 equally-spaced marks signalling different levels of confidence. Participants moved a cursor to the left or right of the scale by pressing the ‘Q’ or ‘P’ buttons respectively. The initial point in the scale was randomly chosen on a trial-by-trial basis. Once the participants selected a confidence rating, they pressed the space bar to continue. After an inter-trial interval (which was uniformly distributed between 0.7 and 0.9 seconds), a new trial began. The orientations of the patches were drawn from uniform distributions with mean m and endpoints m±v. We used distributions with two different means (m = +3 or -3 degrees) and four different variances (given by their different endpoints: v = 10, 14, 24, or 45 degrees). Uniform distributions were pseudo-randomly sampled such that the mean was exactly ±3 degrees on every trial. This generated weak correlations, but multi-collinearity analyses indicated that presentations could not be predicted from previous samples (R2<0.07). Orientations were randomly shuffled to define the presentation order. The experiment consisted of 400 trials: 50 trials for each of the eight distributions. Blocked feedback was given every 20 trials by a message displaying the number of correct trials in that block. Each block comprised 5 trials of each variance condition presented in random order. Therefore, performance for different variance conditions could not be learned from feedback.

Experiment 2: Stability across time

All participants of Experiment 1 were invited to perform the visual task a second time, approximately one month later. 14 participants accepted the invitation and were re-tested. Experiment 2 was performed 35.2±2.4 days after Experiment 1 (range: 23-49 days). Experimenters were blind to the results of Experiment 1 when testing participants in Experiment 2.

Experiment 3: Stability across the perceptual and cognitive domain

20 healthy adults who did not participate in Experiment 1 or 2 performed Experiment 3. Participants performed two sessions: the visual task described in Experiment 1 and a numerical averaging task. Half of the participants performed the visual task first. The second session was performed 9.7±2.9 days (range: 1-27 days) after the first one. Experimenters were blind to the results of the first session when testing the participants in the second session. The numerical task was identical in structure to the visual task but, instead of Gabor patches, two-digit numbers (size: 3.8 deg; font: Arial) were presented. The colour of the numbers (black or white over a middle grey background) was randomly chosen at each presentation. Participants were instructed to decide whether the mean of the sequence was greater or smaller than 50. Numbers were sampled from uniform distributions with mean m = 47 or m = 53, and endpoints m±v were defined by v = 7, 9, 11 or 33. These values were chosen, through pilot experiments with a different set of participants, to obtain performances similar to that observed in Experiment 1. Uniform distributions were pseudo-randomly sampled such that the mean of the sequence was exactly m on each trial. We performed the same multi-collinearity analysis of Experiment 1, and found that presentations could not be predicted from previous samples (R2<0.06). Decisions were collected in the same way as in Experiment 1: a response screen with two options (“smaller” and “greater”) was presented on both sides of the visual field. Participants gave their answer, and indicated confidence, using the same keys as in the visual task.

Control Experiment

Ten healthy adults (aged 20-45, 6 female, all right-handed) who had not participated in Experiment 1, 2 or 3 participated in the Control Experiment. The experiment consisted of one visual and one numerical task that subjects performed in a single session of approximately 90 minutes. Half of the participants performed the visual task first. Participants observed a sequence of items serially flashed at the fovea at 4 Hz, and were asked to provide their analog estimate of the mean. To rate their confidence, participants moved a cursor over a continuous horizontal line. All other parameters (length of the sequence, colour, contrast, brightness, viewing distance, etc.) were identical to our main study. In the visual task, participants observed tilted Gabor patches. The mean of the distribution was uniformly sampled across the entire circle. After observing 30 items, we presented a line in the centre of the screen, initialized at a random orientation. Participants then moved the mouse horizontally to change its orientation until they matched the perceived mean in the sequence. In the numerical task, participants observed two-digit numbers. We uniformly sampled the mean between 44 and 66 (to ensure that all numbers were between 11 and 99 in the condition with higher variance). Participants typed their answer using a keyboard.

Model fitting

To fit the stochastic updating model (Equations [1] and [2]) to the participants’ decisions, we find, for each individual, the parameters λ and γ that maximise the log likelihood, where Φ is the standard cumulative normal function, d is the decision on trial k (+1 if clockwise, -1 if counter-clockwise), σ30,(λ,γ) is obtained from Equation [2], N is the number of trials, and is the mean value of μ30 on trial k. (A minor technical point: Equation [4] describes the visual task; the cognitive task is the same except that the mean is offset by 50.)

Estimating the Fisher information and the perceived probability of being correct

Based on the best fitting parameters λ and γ derived from the stochastic updating model (the values of λ and γ that maximize L(λ,γ) in Equation [2]), we estimated, on a trial-by-trial basis, the observed Fisher information and the expected perceived probability of being correct. The observed Fisher information is just the inverse variance of the participants’ estimate, the latter computed via Equation [2] (Fig. 2a). The expected perceived probability of having made a correct decision, d, is given by The first term inside the integral, p̂(correct|μ30, σ30), is the shaded area under the Gaussian in Fig. 2a; consequently, it is given by the cumulative normal distribution, The second term in the integral, p(μ30|μ̄30,σ30,d), is the probability of observing μ30 given μ̄30, σ30, and, importantly, the decision, d. If the decision is clockwise (d = +1), μ30 must be positive, whereas if the decision is counterclockwise (d = −1), μ30 must be negative. We can take these constraints into account using the Heaviside step function, Θ(x) (which is 1 if x > 0 and 0 otherwise), yielding where Z is the normalisation constant, Combining these two expressions, we have On each trial, p̂(correct|μ̄30, σ30, d) was computed numerically using Matlab. Note that the expected perceived probability of being correct (Equation [9]) is dependent on the decision, d, whereas the Fisher information (Equation [2], Fig.2a) does not depend on d, and so is choice-independent.

Ordinal regression of confidence reports

We ran for each individual a multivariate ordinal regression21. For each of the five possible splits in the rating scale, this regression fits a logistic model with fixed effects and different offsets, where 1 ≤ j ≤ 5, c denotes confidence, and Z and Z are z-scored estimates of the perceived probability of being correct and Fisher information on each trial. The outputs of this regression are the offsets α1,… ,α5, and the weights β and β. To summarise the computations underlying confidence, we selected α3 (the offset when splitting the scale in halves, which we refer to as the overall confidence), β (the weight of the probability of being correct on confidence) and β (the weight of information on confidence).

Statistical analyses

In Experiment 1, we computed the average performance for each variance condition and each participant. These values were submitted to a one-way repeated measures analysis of variance (rm-ANOVA) with factor “variance condition” (4 levels) and “participant” (30 levels) as repeated measure (Fig. 1). The normality assumption of this test was checked using the Lilliefors test (k=0.7, c=0.8, p=0.07). We also computed the average confidence rating for each variance condition and each participant, conditioned on correct or incorrect trials, and submitted those values to a two-way rm-ANOVA with factors “variance condition” (4 levels), “outcome” (2 levels: correct or incorrect), and “participant” (30 levels) as repeated measure (Fig. 2c). The normality assumption of this test was checked using the Lilliefors test (k=0.04, c=0.06, p>0.5). The goodness of the fit for each model and subject (Supplementary Fig. 1b), quantified by the negative log-likelihood (Equation [3]), was submitted to a two-sided paired t-test (29 degrees of freedom). The normality assumption of this test was checked using the Lilliefors test (k=0.08, c=0.11, p>0.5). In Experiment 2, we compared the within-participants distances in the space defined by (β,β,α3) with the between-subjects distances. Because we have 14 participants, this defines 14 within-subjects distances and 14×13/2=91 between-subjects distances. We z-scored each dimension and used the Euclidean metric to compute distance. The Lilliefors test rejected the null hypothesis that these values were normal (k=0.1, c=0.08, p=0.01); therefore, we used a non-parametric test, the Wilcoxon ranked sum test. This test is unpaired and the reported p-value is two-sided. In Experiment 3, we computed the average performance for each variance condition, task, and participant (Fig. 5a,b). We submitted these values to a two-way rm-ANOVA with factors “variance condition” (4 levels), “task” (2 levels), and “participants” (20 levels) as repeated measure. The normality assumption of this test was checked using the Lilliefors test (k=0.07, c=0.09, p=0.36). We computed the average confidence rating across all conditions and participants and performed the same rm-ANOVA used in Experiment 1 (Fig. 5c,d). As in Experiment 1, average confidence was normally distributed (Lilliefors test, k=0.06, c=0.07, p=0.17). To evaluate the stability of (β,β,α3) across domains, we computed the within- and between-subjects distances following the same procedure of Experiment 2, and compared these values using the same non-parametric test.

42 in total

The idiosyncratic nature of confidence.

Introduction

Results

Computation of confidence

Individual differences and their stability over time

Consistency across tasks

Discussion

Methods

Participants

Display

Experiment 1: Visual task

Experiment 2: Stability across time

Experiment 3: Stability across the perceptual and cognitive domain

Control Experiment

Model fitting

Estimating the Fisher information and the perceived probability of being correct

Ordinal regression of confidence reports

Statistical analyses

1. Judging confidence influences decision processing in comparative judgments.

2. Neural correlates, computation and behavioural impact of decision confidence.

3. Signatures of a Statistical Computation in the Human Sense of Confidence.

Review 4. Probabilistic brains: knowns and unknowns.

5. Representation of confidence associated with a decision by neurons in the parietal cortex.

6. Online social network size is reflected in human brain structure.

Review 7. What failure in collective decision-making tells us about metacognition.

8. Weighting mean and variability during confidence judgments.

9. The spatial and temporal construction of confidence in the visual scene.

10. The perceptual and social components of metacognition.

1. Response-Related Signals Increase Confidence But Not Metacognitive Performance.

2. Rational arbitration between statistics and rules in human sequence processing.

3. The Representational Dynamics of Sequential Perceptual Averaging.

4. Confidence of emotion expression recognition recruits brain regions outside the face perception network.

5. Visual attention modulates the integration of goal-relevant evidence and not value.

6. Suboptimality in Perceptual Decision Making.

7. Stochastic satisficing account of confidence in uncertain value-based decisions.

Review 8. Sources of Metacognitive Inefficiency.

9. Judgments of agency are affected by sensory noise without recruiting metacognitive processing.

10. Human metacognition across domains: insights from individual differences and neuroimaging.