Literature DB >> 26433023

Human observers have optimal introspective access to perceptual processes even for visually masked stimuli.

Abstract

Many believe that humans can 'perceive unconsciously' - that for weak stimuli, briefly presented and masked, above-chance discrimination is possible without awareness. Interestingly, an online survey reveals that most experts in the field recognize the lack of convincing evidence for this phenomenon, and yet they persist in this belief. Using a recently developed bias-free experimental procedure for measuring subjective introspection (confidence), we found no evidence for unconscious perception; participants' behavior matched that of a Bayesian ideal observer, even though the stimuli were visually masked. This surprising finding suggests that the thresholds for subjective awareness and objective discrimination are effectively the same: if objective task performance is above chance, there is likely conscious experience. These findings shed new light on decades-old methodological issues regarding what it takes to consider a neurobiological or behavioral effect to be 'unconscious,' and provide a platform for rigorously investigating unconscious perception in future studies.

Entities: CellLine Chemical Disease Gene Species

Keywords: Bayesian ideal observer; blindsight; consciousness; criterion response bias; human; neuroscience; subliminal perception; unconscious perception

Mesh：

Year: 2015 PMID： 26433023 PMCID： PMC4749556 DOI： 10.7554/eLife.09651

Source DB: PubMed Journal: Elife ISSN： 2050-084X Impact factor: 8.140

Introduction

Above-chance performance without awareness in perceptual discrimination tasks is a strong form of unconscious perception. In these demonstrations (e.g., blindsight: Weiskrantz, 1986) the subjective threshold for awareness (when a stimulus is consciously ‘seen’) seems well above the objective threshold for forced-choice discrimination (when a stimulus can be correctly identified): subjects can discriminate a target above chance performance, yet report no awareness of the target. Many researchers believe normal, healthy subjects can also directly discriminate near-threshold, low-intensity targets without subjective awareness (e.g., Boyer et al., 2005; Charles et al., 2013; Merikle et al., 2001; but see Snodgrass et al., 2004 for an opposing view). We conducted an informal survey to confirm this popular belief, which also revealed that many believe convincing evidence for this phenomenon is lacking. We asked survey participants three key questions: (1) "Do you believe in subliminal perception?" (2) "Do you believe that the subjective threshold for awareness is above the objective discrimination threshold?" and (3) "If ‘yes’, do you believe this has been convincingly demonstrated in the literature?" Most respondents reported believing that subliminal processing exists (94%), but also that they did not believe it had been convincingly demonstrated in the literature (64%). These belief patterns were shown even among those who reported having published on subliminal or unconscious perception (94% and 61%, respectively). See Appendix 1 for full text of questions and detailed survey results. A primary culprit in this controversy is the problem of criterion bias: an observer’s report of ‘unseen’ doesn’t necessarily imply complete lack of awareness, only that the stimulus’ strength fell below some arbitrary boundary for reporting ‘seen’ (Eriksen, 1960; Hannula et al., 2005; Lloyd et al., 2013; Merikle et al., 2001). Unfortunately, most methods of studying unconscious perception suffer from this ‘criterion problem’ (e.g., Charles et al., 2013; Jachs et al., 2015; Ramsøy and Overgaard, 2004). With such methods, one could argue that reports of ‘unawareness’ may only mean some stimuli are relatively hard to perceive compared to those that are clearly visible. To avoid this criterion problem, several groups (Kolb and Braun, 1995; Kunimoto et al., 2001) sought to identify conditions in which confidence was uncorrelated with accuracy, which they argued would indicate no subjective awareness of the target. Unfortunately, some of these efforts were not replicable (Morgan et al., 1997; Robichaud and Stelmach, 2003). Others revealed that estimating the correspondence between confidence and accuracy requires mathematical considerations more complicated than originally envisaged (Evans and Azzopardi, 2007; Galvin et al., 2003; Maniscalco and Lau, 2012). Importantly, the conceptual link between metacognitive sensitivity (i.e., correlation between confidence and accuracy) and conscious awareness is itself controversial (Charles et al., 2013; Fleming and Lau, 2014; Jachs et al., 2015). Here, we employ a recently-developed confidence-rating method to address this problem (Barthelmé et al., 2009; de Gardelle and Mamassian, 2014). Subjects discriminated two stimulus intervals, only one of which contained a target, and indicated confidence in their decisions using a 2-interval forced-choice procedure (2IFC), that is, indicating which of the two discrimination decisions they felt more confident in. This approach has several advantages. First, 2IFC tasks depend little on response bias compared to multi-point confidence-rating scales. Maintaining the criteria for extensive confidence scales may also be demanding, leading subjects to respond somewhat randomly in conditions of vague awareness and thereby producing the negative result Kolb and Braun (1995) observed (Morgan et al., 1997). Second, the interpretation of 2IFC confidence-rating in this context is straightforward: ‘Performance without Awareness’ would mean subjects can perform the target discrimination yet fail to place bets appropriately to distinguish this performance from discrimination of a blank stimulus (which guarantees chance performance). That is, following psychophysics traditions (Kolb and Braun, 1995; Peirce and Jastrow, 1884), if a certain above-chance discrimination seems introspectively no different from a random guess based on no stimulus at all (as reflected by betting behavior), we interpret the discrimination to be unconscious. Here, we explored whether such Performance without Awareness occurs in normal observers in two behavioral experiments, and compared these results to predictions of a Bayesian ideal observer.

Results

Behavioral experiments

Nine human observers participated in two experiments of our 2IFC confidence-rating paradigm (Figure 1). In both experiments, participants viewed two intervals in which they were required to discriminate the orientation (right or left tilt) of a Gabor patch target embedded in forward- and backward-masks (Figure 1A,B), and judged which of the discrimination choices they felt more confident in. Crucially, in one of the intervals the target was absent (Figure 1B), such that above-chance discrimination performance was impossible. We performed two experiments to assess the potential contributions of question order, receipt of feedback, and a priori knowledge of the presence of a target-absent interval (Figure 1C). In Experiment 1, participants judged which decision they felt more confident in and then indicated their orientation decisions for both intervals, while in Experiment 2 they indicated their orientation discrimination decisions before selecting the more-confident interval. In Experiment 2, we also provided feedback on the confidence decision, and told participants that one interval contained no target; this information was withheld from participants in Experiment 1. Stimuli, timing details, and order of question prompts in the two experiments are also discussed in greater detail in the Methods section.

Figure 1.

Stimuli and procedures for the 2IFC confidence-rating task.

DOI: http://dx.doi.org/10.7554/eLife.09651.003

Stimuli and procedures for the 2IFC confidence-rating task.

(A) Targets consisted of oriented (45° left- or right-tilted from vertical) Gabor patches presented at multiple near-threshold contrast levels; masks consisted of bandpass-noise filtered random RGB values (see Materials and methods). (B) Each trial consists of two intervals of discrimination in which the target stimulus (T) was forward- and backward-masked (M). Gabor patch targets were presented only in target-present (TP) intervals; in target-absent (TA) intervals, the target was replaced with blank frames. Otherwise timings of stimuli were matched between the two intervals. (C) Experimental tasks. Experiment 1 required subjects to bet on which discrimination they felt more confident before they indicated their orientation discrimination choices (left or right tilt of the Gabor) sequentially for both intervals. Shown is an example trial in which TP is presented before TA; in the experiment this order varied randomly from trial to trial. In Experiment 2, subjects bet on the more confident interval after the discriminations, and feedback was given. (See Materials and methods for more details.) DOI: http://dx.doi.org/10.7554/eLife.09651.003 For both experiments, we evaluated whether participants exhibited Performance without Awareness (Figure 2A) or Performance > Awareness (Figure 2B). In both cases, the response pattern of interest can be visualized as percent of time betting on the target-present interval as a function of percent correct orientation discrimination in the target-present interval. ‘Performance without Awareness’ (Figure 2A) would be supported if observers can discriminate the target above chance (>50% accuracy) while being unable to bet on their choices more often than betting on the target-absent interval (which necessarily yields chance-level performance). That is, observers correctly discriminate the target’s orientation more than 50% of the time, but bet on the target-present interval 50% of the time (i.e., they bet randomly on the target-present versus target-absent interval), indicating they are not aware of the information that contributed to their discrimination decision. If this were to occur, it would most likely happen at low discrimination performance levels, yielding a pattern of behavior similar to that presented in Figure 2A.

Figure 2.

Schematic explanation of predictions of the experiments.

DOI: http://dx.doi.org/10.7554/eLife.09651.004

Schematic explanation of predictions of the experiments.

(A) A ‘Performance without Awareness’ pattern of behavior, in which subjects are able to discriminate the target above chance while betting on the target-present interval at chance. (B) A ‘Performance > Awareness’ pattern of behavior, in which subjects are less able to bet on their discrimination decisions than they are able to correctly discriminate the target. In both (A) and (B), the diagonal dashed line indicates where rate of betting on the target-present interval equals objective discrimination performance. DOI: http://dx.doi.org/10.7554/eLife.09651.004 However, in psychophysics, thresholds can also be defined as midway between ceiling and floor performance (Macmillan and Creelman, 2004), such that threshold discrimination performance is defined as 75% accuracy rather than >50% (chance level). This concept can also be applied to subjective betting data in the sense that betting on the target-present interval could be considered ‘correct’ or ‘advantageous’ betting. In this sense (threshold = 75% correct performance), the subjective threshold for confidence might be above the objective threshold for discrimination. In other words, observers may bet on the target-present interval less often than they get the discrimination correct, but still above chance. This would occur because the orientation discrimination choice requires evaluation of only one interval (the one with the target in it) and therefore is subject to only one source of uncertainty, but the ‘betting’ choice requires evaluation of both intervals, and therefore has two potential sources of uncertainty. This pattern of behavior (Figure 2B) may occur even if subjects do not display Performance without Awareness, and would be characterized by a pattern of responses that fall below the identity line (diagonal dashed line). We call this possibility ‘Performance > Awareness’. We discuss the results of both experiments together for ease of interpretation, and because the results are very similar (Figure 3A–F). To anticipate, we found no evidence of Performance without Awareness. Although we found strong evidence of Performance > Awareness across the experiments (Figure 3A,D), subsequent computational modeling (Bayesian Ideal Observer Model section) suggests that this is somewhat trivial: even an ideal observer is expected to show Performance > Awareness (Figure 3G; see Bayesian Ideal Observer Model section for further explanation).

Figure 3.

Group-level results of behavioral experiments (rows 1 and 2), presented in comparison to the predictions of the Bayesian ideal observer model (row 3; see Materials and methods - Computational Model).

DOI: http://dx.doi.org/10.7554/eLife.09651.005

Group-level results of behavioral experiments (rows 1 and 2), presented in comparison to the predictions of the Bayesian ideal observer model (row 3; see Materials and methods - Computational Model).

In both experiments, human observers displayed no evidence of Performance without Awareness, but appeared to demonstrate Performance > Awareness (panels A and D). However, the ideal observer model also demonstrated such behavior (panel G), indicating that it is not suboptimal at all but arises from the 2IFC nature of the confidence task (see Bayesian Ideal Observer Model results section and Figure 2 caption for explanation). Horizontal gray lines in panels A, D, and G indicate chance-level betting (50%) on the target-present (TP) interval. Panels B, E, and H show rising Type 2 hit rate (‘HR’; when subjects bet on a correct orientation discrimination choice) but relatively flat Type 2 false alarm rate (‘FAR’; when subjects bet on an incorrect orientation discrimination choice), and panels C, F, and I show higher orientation discrimination accuracy when the target-present (TP) interval is bet on; these patterns suggest that human subjects and the Bayesian ideal observer were rating confidence via assessing their probability of correctly discriminating orientation, rather than target presence versus absence only. The model demonstrates good explanatory power for the data across all participants (mean proportion of variance accounted for by the model, R2 = 0.565). Error bars for behavioral data indicate the standard error of the mean across subjects with data in each bin. DOI: http://dx.doi.org/10.7554/eLife.09651.005 To look for evidence of Performance without Awareness, we first plotted percent of trials in which observers bet on the target-present interval against orientation discrimination accuracy for both experiments (Figure 3A,D). In contrast to what might have been suggested based on previous results (e.g., Boyer et al., 2005; Charles et al., 2013; Merikle et al., 2001; but see Snodgrass et al., 2004), visual inspection alone clearly reveals no evidence for Performance without Awareness in either experiment: it looks as though observers could bet on the target-present interval above chance as soon as they were able to discriminate the target above chance, and there is no hint of the Performance without Awareness pattern. We quantitatively assessed the possibility of Performance without Awareness using a Bayesian observer model (see Modeling Results, below), but found no evidence that a Performance without Awareness pattern could capture human behavior. Individual subjects’ performance closely resembles group data and averages (Appendix 2). Because thresholds can be defined in psychophysical terms (75% performance) rather than absolute terms (>50%), we also evaluated the possibility of Performance > Awareness. We used kernel smoothing regression (see Materials and methods) to interpolate each individual subject’s data in order to estimate how often subjects bet on the target-present interval when they were performing at 75% correct on orientation discrimination. Because results are very similar across the two experiments, we combined results from both and performed a two-tailed one-sample t-test to assess whether this predicted percentage betting on the target-present interval significantly diverged from 75%. This analysis revealed that observers bet on the target-present interval significantly less than 75% of the time at 75% correct orientation discrimination accuracy (Figure 3A,D, Table 1). Thus, observers exhibited Performance > Awareness (but see also Modeling Results, below).

Table 1.

Individual values, means, standard deviations, and p-values for t-tests showing that Performance > Awareness occurs across both experiments. Results from Experiment 2 show that the pattern does not change with different question order or feedback.

DOI: http://dx.doi.org/10.7554/eLife.09651.006

Expt	Subject		p(choose TP interval) at p(correct) = 0.75
1	1	AVT	0.676
	2	AM	0.714
	3	JDK	0.716
	4	SH	0.682
	5	MM	0.684
	6	AC	0.685
	7	MR	0.674
	8	MK	0.658
	9	RA	0.619
2	1	AVT	0.666
	2	AM	0.713
	3	JDK	0.746
Mean (σ)			0.686 (0.033)
t(11)			6.718
p			0.00003

2IFC detection?

One possible concern is that subjects were not rating confidence but instead engaging in 2IFC detection of the target-present interval. To confirm that subjects were indeed rating confidence, we plotted Type 2 hit rate and Type 2 false alarm rate against orientation discrimination accuracy (Figure 3B,E). A Type 2 hit is defined as placing a bet on a correct orientation discrimination decision, whereas a Type 2 false alarm is defined as placing a bet on an incorrect orientation discrimination decision. These are in contrast to Type 1 hits and false alarms, which can be defined as saying ‘left’ when a left-tilted Gabor was presented and saying ‘left’ when a right-tilted Gabor was presented, respectively, according to standard signal detection theoretic definitions (Green and Swets, 1966; Macmillan and Creelman, 2004). Subjects displayed increasing Type 2 hit rate as a function of orientation discrimination accuracy, whereas Type 2 false alarm rate remained relatively flat at around 50% (chance level) across increasing orientation discrimination accuracy. In other words, subjects did not bet on orientation discrimination choices they expected to get wrong, even at high performance (i.e. high contrast) levels. Thus, they were probably truly rating confidence and not simply engaging in 2IFC detection. In keeping with this observation, we also plotted orientation discrimination accuracy conditional upon subjects’ selection of the target-present interval, i.e. p(correct and p(correct (Figure 3C,F). This visualization revealed that subjects were worse at orientation discrimination when they did not select the target-present interval. This result is in keeping with typical observations of worse objective performance for low confidence trials, since not betting on the target-present interval is essentially an indication of low confidence in that discrimination choice. See also the 'Unconscious ‘hunches’?’ section, below. Notably, the similarity in participants’ behavior between Experiments 1 and 2 reveals that receipt of feedback on confidence judgments, knowledge that one interval is physically blank, question order, and ability to monitor reaction time do not affect behavioral outcomes.

Unconscious ‘hunches’?

Throughout this report, we define conscious awareness of the target to occur when introspective assessment of the correctness of an orientation discrimination choice can differentiate between a target being present or not. In this sense, observers are unconscious of the information contributing to their decision if they can discriminate a target above chance, but doing so feels no different introspectively from discriminating (or guessing about) nothing at all. However, one concern might be that subjects are able to meaningfully rate confidence despite no subjective visual experience of the stimulus due to some sort of non-visual ‘hunch’ or ‘feeling’. Indeed, such metacognitive insights (the ability to introspectively distinguish between correct and incorrect responses) have recently been reported even in the absence of objective task performance sensitivity, although not in the context of perception (e.g., Scott et al., 2014). We think this issue is essentially one of terminology; our definition of conscious awareness follows a long history in psychology and psychophysics traditions in relating the ability to meaningfully rate confidence to subjective awareness (c.f. Kolb and Braun, 1995; Peirce and Jastrow, 1884), according to which, strictly speaking, a non-visual hunch is also defined as conscious so long as it meaningfully tracks visual processes; regardless of whether such ‘hunches’ are visual in nature, it is still meaningful to distinguish between having such introspective insight versus having no insight whatsoever. However, we also ran a control study in which the subjective task was to indicate which interval appeared more visible rather than confidence in the corresponding discrimination. In other words, it was akin to a 2IFC detection task rather than a metacognitive judgment. Results of this control study (Appendix 3) mirrored those of the main experiments: as soon as participants were able to discriminate the target above chance, they were able to indicate which interval contained the target above chance. Thus, even when the 2IFC task was visibility judgment rather than confidence, subjects’ behavior was inconsistent with the Performance without Awareness pattern -- suggesting there is also no Performance without Visual Awareness. See Appendix 3 for details of the control study.

Bayesian ideal observer model

We developed a Bayesian ideal observer model utilizing a similar representation space as standard 2-dimensional signal detection theory (Figure 4) (King and Dehaene, 2014; Macmillan and Creelman, 2004). The primary finding is that even an ideal observer model exhibits Performance > Awareness, as depicted in Figure 1B. Intuitively, this effect occurs because the orientation discrimination choice requires evaluation of only one interval (the one with the target in it) and therefore is corrupted by only one source of noise, but the ‘betting” choice requires evaluation of both intervals, and therefore has two potential sources of noise.

Figure 4.

Illustration of the Bayesian ideal observer’s 2-dimensional representation space, following standard 2-dimensional signal detection theory (King and Dehaene, 2014; Macmillan and Creelman, 2004).

DOI: http://dx.doi.org/10.7554/eLife.09651.007

Illustration of the Bayesian ideal observer’s 2-dimensional representation space, following standard 2-dimensional signal detection theory (King and Dehaene, 2014; Macmillan and Creelman, 2004).

(a) Distributions Sand S lie on orthogonal axes c and crepresenting left- and right-tilted targets, respectively, and the noise distribution lies at the origin. On each simulated trial, the model ‘sees’ two samples, one drawn from a source distribution Sto represent the target-present interval (d) and the other from the noise distribution to represent the target-absent interval (d). It marginalizes across all contrast evidence levels to guess the orientations of both samples according to the posterior probabilities of left- and right-tilted sources. Then, it compares the posterior probabilities of the chosen orientations in each interval to select the interval with higher confidence (p(correct)) (see Materials and methods - Bayesian ideal observer model). DOI: http://dx.doi.org/10.7554/eLife.09651.007 The model ‘performs’ a 2IFC confidence discrimination by comparing the posterior probability of left- or right-tilted source distributions given the data to perform the orientation discrimination task on each of the two intervals on each trial. Then, it uses the posterior probability of the choice it made on each interval as a measure of confidence (i.e., p(correct)), and compares this measure between the two intervals to select the choice it is more confident in (see Figure 4 and Materials and methods – Bayesian ideal observer model). We also explored several model variants to establish the robustness of the model’s performance; see Appendix 4 for details on model variants. Unsurprisingly, the Bayesian ideal observer did not display signs of Performance without Awareness. We next evaluated whether causing the model to exhibit Performance without Awareness (Figure 2A) by degrading the 2IFC confidence judgment could produce better fit to participants’ data. We tested three levels of increasing decisional noise (σ; see Materials and methods) to cause the model to exhibit increasing Performance without Awareness as described in Figure 2A, and assessed the goodness of fit (R2) for each subject for each decisional noise value. We found that causing the model to exhibit increasing Performance without Awareness behavior resulted in increasingly worse R2 values (Table 2). To confirm this trend, we conducted a 12 (subjects; subjects 1–3 who completed both experiments are treated independently) x 4 (decisional noise magnitude) repeated measures ANOVA on the R2 values. This analysis revealed a main effect of decisional noise (F(3,33) = 19.301, p <0.001), indicating that the ideal observer model (σ = 0) best captures human performance, and that any suboptimal Performance without Awareness (σ >0) pattern fits human data more poorly than the ideal observer behavior – even without punishing the decisional noise model for having an additional parameter.

Table 2.

DOI: http://dx.doi.org/10.7554/eLife.09651.008

Expt	Subject	Decisional noise σ_d
Expt	Subject	0 (Ideal observer)	0.1	0.2	0.3
1	1	0.465	0.459	0.456	0.447
	2	0.580	0.578	0.565	0.544
	3	0.470	0.464	0.448	0.428
	4	0.396	0.392	0.381	0.363
	5	0.649	0.655	0.645	0.628
	6	0.480	0.473	0.458	0.434
	7	0.453	0.452	0.444	0.427
	8	0.602	0.595	0.583	0.563
	9	0.503	0.509	0.512	0.504
2	1	0.624	0.624	0.622	0.612
	2	0.783	0.780	0.775	0.766
	3	0.777	0.778	0.767	0.753
Mean R² (σ)		0.565 (0.126)	0.563 (0.128)	0.555 (0.129)	0.539 (0.131)

R2 values quantifying goodness of fit for ideal observer (σ = 0) and three alternative decisional noise magnitudes (σ >0) which cause increasing degrees of Performance without Awareness. Decisional noise greater than 0 – i.e., increased level of Performance without Awareness – causes a drop in goodness of fit between model and human data. See Methods and Appendix 4 for more details. DOI: http://dx.doi.org/10.7554/eLife.09651.008 Crucially, however, the ideal observer does exhibit Performance > Awareness (Figure 3G), and to a similar extent as our human participants (R2 = 0.565; see Appendix 4 for details of goodness of fit metrics); trends for Type 2 hit and false alarm rates (Figure 3H), and percent correct conditional upon having bet on the target-present versus target-absent interval (Figure 3I), also match human data. That the ideal observer exhibits behavior that may seem suboptimal, and in the same pattern as human observers, confirms that this perhaps counterintuitive but optimal behavior arises from the confidence-comparison nature of the 2IFC confidence-rating task: the decision about orientation in the target-present interval is limited by one source of noise (the single target-present interval), but the comparison of confidence is limited by the system’s noise in both intervals. So even if confidence monotonically increases with accuracy for the target-present interval, there will be trials in which – by chance – the discrimination choice for the blank (target-absent) interval happens to seem more confident, that is, its posterior probability is larger. This will happen sometimes even on trials in which the observer gets the target-present orientation discrimination correct. In these trials, the observer (human or simulated) will select the target-absent interval. This process will lead to the appearance of what we called Performance > Awareness, as displayed by our human participants and ideal observer (refer also to Figure 2 for additional explanation). Thus, the subjective ratings by human participants are already close to ideal, as if the actual effective threshold for subjective awareness is no different from the objective threshold for discrimination. Importantly, this is true despite the apparent measured differences in psychophysically defined thresholds (75%).

Discussion

Blindsight (Weiskrantz, 1986) is the intriguing demonstration of Performance without Awareness in neurological patients. Despite widely held beliefs by experts, here we found no evidence that it occurs in normal observers. Importantly, although the measured psychophysical threshold (75%) for awareness seemed to be above the objective discrimination threshold, computational analysis revealed that the actual effective thresholds are essentially the same; people’s subjective ratings are close to ideal, given their objective performance levels. This challenges longstanding beliefs regarding the nature of subjective versus objective thresholds in perceptual studies (Merikle et al., 2001; see survey results in Appendix 1). Our findings cannot rule out all forms of unconscious perception, such as subliminal priming, in which the evidence for unconscious processing is typically indirect benefits in reaction times (Hannula et al., 2005). However, our findings bear upon those studies, too. Traditionally, interpreting such effects as unconscious required that the relevant stimuli yield zero sensitivity in a direct task (d’ = 0). Recently, many have relaxed this requirement and considered subjectively reported lack of awareness as sufficient (Pessiglione et al., 2009; Soto et al., 2011), presumably because we (wrongly) believed that certain stimuli might surpass the objective threshold while still being below the subjective one. One may also argue that while objective threshold requirements are rigorous, the valid and meaningful measure is the subjective threshold (Charles et al., 2013; Merikle et al., 2001). Our results suggest this reasoning is flawed. If a stimulus surpasses the objective threshold, there is likely conscious experience; subjects likely report lack of awareness because they interpret the response options in relative terms in the context of stimuli of various strengths. This undermines claims that higher-cognitive phenomena – e.g. working memory, error detection, or motivation – can really operate unconsciously, if assessed with reference to subjective rather than objective thresholds (Charles et al., 2013; Pessiglione et al., 2009; Soto et al., 2011). Although the 2IFC confidence-rating procedure bypasses the response bias problem, interpreting the subjective vs. objective function is non-trivial: to determine whether participants’ Performance > Awareness behavior was optimal required detailed computational analysis. An alternative approach, which may be simpler, would be to compare the objective and subjective functions between task conditions, in a rationale similar to Lau and Passingham (2006). Although we found no evidence of ‘blindsight’ in normal observers, our study lays out the logic of what would be required to demonstrate it unequivocally. For example, it has recently been argued that TMS-induced ‘blindsight’ (Boyer et al., 2005) is contaminated by criterion bias (Lloyd et al., 2013). 2IFC confidence-rating may help resolve such issues without invoking theoretically complicated problems concerning signal detection theory (e.g., Heeks and Azzopardi, 2015). Thus, despite their negative nature, our findings may beget fruitful lines of inquiry to address which stimuli, procedures, or brain stimulation techniques can selectively impair subjective conscious experience, beyond impacting sheer objective processing sensitivity.

Materials and methods

Subjects

Twelve subjects (two women, ages 19–32, ten right-handed) gave written informed consent to participate in our behavioral experiments. All subjects had normal or corrected-to-normal eyesight, and wore the same corrective lenses for all sessions, if applicable. Behavioral experiments were conducted in accordance with the Declaration of Helsinki and were approved by the UCLA Institutional Review Board.

Stimuli and apparatus

Targets consisted of Gabor patches (sinusoidal gratings) at a spatial frequency of 0.025 cycles/pixel, tilted by 45° to the right or the left of vertical. Gratings and subtended 500 pixels, or ~111 visual degrees, and were presented in a circular annulus with a Gaussian hull spatial constant of 100. On each trial, targets could take on one of thirteen possible contrast levels drawn from the range 15–90%. Masks consisted of white noise patches of random RGB values bandpass-filtered to a range of spatial frequencies immediately surrounding the spatial frequency of the target. They were presented in a circular annulus of identical size to the spatial envelope of the Gabor patch targets. All stimuli were displayed via a custom Matlab R2013a (Natuck, MA) script utilizing PsychToolbox 3.0.12 on a gamma-corrected Dell E773c CRT monitor with a refresh rate of 75 Hz.

Procedure – Experiment 1

Nine subjects participated in Experiment 1. Subjects were seated with their chins in a chinrest at a viewing distance of 42 cm from the screen. Targets and masks (Figure 1A) were presented for two to three frames (33–40 ms) each (jittered timing, with equal probability for two or three frames), with 33-40 ms ISI between masks and 0ms ISI for target-mask or mask-target transitions, in a forward- and backward-masking paradigm in which three masks were presented before and three after the target presentation (i.e., the target was ‘sandwiched’ between mask presentations) (Figure 1B). The trial structure extends the two-by-two forced-choice (2x2FC) paradigm first introduced by Nachmias and Weber (1975) and subsequently employed to explore the relationship between detection and identification (e.g., Thomas et al., 1982; Watson and Robson, 1981), and more recently applied to research on confidence (Barthelmé et al., 2009, 2010; de Gardelle and Mamassian, 2014). We combined these procedure types. In our procedures, each trial consists of two time intervals, within only one of which the target is presented. In target-absent intervals, the target presentation was replaced with blank frames, similar to the blank frames between masks, to maximize phenomenological similarity between target-present (TP) and target-absent intervals (TA) (Figure 1B). Unlike previous usage of the 2x2FC, however, we required observers to indicate target orientation on both target-present and target-absent intervals within a trial in addition to the final judgment type, despite the fact that there was a target in only one of the intervals. In target-present intervals, targets were presented at 45° tilted right or left from vertical at one of the possible contrasts. Following presentation of both intervals, observers pressed a key indicating which discrimination decision they would like to bet on (a measure of confidence; Type 2 judgment), and then indicated their discrimination choices for both intervals in order (leftward or rightward tilt; Type 1 judgment) (Figure 1C). In target-absent intervals, participants’ answers were coded as ‘correct’ with 50% probability. No feedback was provided on a trial-by-trial basis. To motivate subjects, we informed them that a target was present in both intervals, but that one might be harder to discriminate than the other. Subjects were informed that they would be awarded a point for every correct discrimination (Type 1 judgment), and an additional point every time they bet on an interval they discriminated correctly (Type 2 judgment), and total points were displayed at the end of the experiment; they were also told that if they earned more points than the previous participant, they would be paid an additional $10 bonus at the end of all sessions. In each behavioral session, trials were presented in a randomized full factorial design, counterbalancing interval order, in ten blocks of 52 trials per block. Every subject undertook five 60-minute sessions, for a total of 2600 trials spread across up to thirteen contrast levels, two orientations, and two interval presentation orders. Levels of contrast presented to each participant were titrated across sessions to ensure performance spanning approximately evenly from chance (50% correct) to 100% correct, resulting in no fewer than 200 trials per contrast level (10 trials per condition x 2 orientations x 2 interval orders x 5 sessions). Subjects were paid $10 per session.

Procedure – Experiment 2

Three subjects who had participated in Experiment 1 also participated in Experiment 2. Procedures for Experiment 2 were identical to those described above for Experiment 1, except for the feedback structure, observer’s knowledge about target-present versus target-absent intervals, and order of questions (Figure 1B). In Experiment 2, we wanted to motivate subjects to bet on the target-present interval as much as possible, to maximize the possibility of observers performing optimally (i.e., to alleviate any Performance > Awareness). So, we defined a ‘correct’ Type 2 judgment for the purposes of feedback only as a Type 2 hit, i.e. trials in which the observer correctly discriminated the target-present interval and bet on the target-present interval. Subjects were also informed that in one of the intervals the target was physically absent, and that betting on that interval would not earn them a point even if they ‘discriminated’ its orientation correctly (as before, they still had a 50% chance to earn a point for ‘correctly discriminating’ the target-absent interval; subjects were made aware of this structure). Additionally, we provided ‘correct/incorrect’ feedback on the Type 2 responses to further encourage betting on the target-present interval. Finally, we altered the question order such that after each interval was presented, subjects pressed a button to discriminate the interval, and then only after both intervals had been presented did they indicate which choice they would like to bet on. In this way, subjects were allowed the ability to monitor their own reaction times, which ought to be faster for target-present intervals on average (as target-absent intervals are simply guesses by definition); this would provide another source of potential information to contribute to confidence judgments, as it has been shown that subjects use reaction time monitoring to inform confidence judgments (Kiani et al., 2014). Points were awarded as in Experiment 1, and the same bonus payment motivation was employed. Also as before, participants completed five behavioral sessions each for Experiment 2, and were paid $10 per session.

Statistical analyses

For each subject in each experiment, data were collapsed across tilt (left/right), interval presentation order (first/second), and session for each contrast level. At each contrast level for each subject, we next calculated (a) percent correct orientation discrimination, (b) percent of trials in which the target-present interval was chosen, (c) Type 2 hit rate and Type 2 false alarm rate according to standard Type 2 signal detection theoretic definitions (Type 2 hit: correct orientation discrimination and bet on target-present interval; Type 2 false alarm: incorrect orientation discrimination and bet on target-present interval) (Fleming and Lau, 2014; Maniscalco and Lau, 2012), and (d) percent correct orientation discrimination conditional on having chosen the target-present versus target-absent interval. Group-level analyses and graphical presentation were conducted by binning subjects’ data into ten equally-spaced bins of percent correct orientation discrimination performance in the range 0.5 – 1 and calculating the mean and standard deviation of each of the above statistics for each bin. To interpolate between discrete data points, we fitted a kernel smoothing regression function to each observer’s data, which is a non-parametric approach to estimate the conditional expectation of a random variable, where f is a non-parametric function. This approach is based on kernel density estimation, implementing Nadaraya-Watson kernel regression (Nadaraya, 1964; Watson, 1964) via where K is a Gaussian kernel with bandwidth h. All analyses were carried out in Matlab R2013a (Natuck, MA) and SPSS Version 22 (IBM Corporation; Armonk, NY).

Model space

Our model representation space extends Macmillan and Creelman’s (2004) two-dimensional signal detection theory (SDT) and related Bayesian (King and Dehaene, 2014) framework, in which stimulus categories are represented by bivariate Gaussian distributions centered along the axes in a Cartesian plane, and ‘noise’ (or a blank stimulus) is represented by a similar bivariate Gaussian centered at the origin (Figure 4). Although for this particular task we could have used a 1-dimensional space alternative (see e.g. Sridharan et al., 2014), to facilitate additional model variants (see Appendix 4) and possible future applications to stimuli that contain a mixture of multiple stimulus categories, we elected to present the model in a two-dimensional format. To accomplish both the orientation discrimination and 2IFC confidence judgments, on each simulated two-interval trial, two pairs of evidence values (representing the evidence in favor of a left- or right-tilted target) of the form d = [d] are drawn: one sample is drawn from one of the signal distributions S (d, target-present intervals), and the other drawn from the noise distribution (d, target-absent intervals) (Figure 4).

Inference process

Our ideal observer employs Bayesian inference in which each interval’s sample (i.e., evidence pair) d is first categorized as belonging to S or Son the basis of the posterior probabilities of each, and then uses the posterior probability of the chosen orientation as a measure of confidence in each discrimination decision. We assume that each generating stimulus category, S, is dependent on the evidence in favor (or contrast) of the presented stimulus, c, and can be represented by a bivariate Gaussian distribution such that for a ‘left’ tilt and for a ‘right’ tilt (Figure 5). Additionally, in the most basic formulation we define (although we explore other potentially more biologically plausible variants; see Appendix 4). We also assume the c and c axes (left and right tilt) to be orthogonal, although this constraint is not necessary for the model to capture behavioral performance (see Appendix 4).

Figure 5.

Illustration of increasing values for σ on the appearance of Performance without Awareness behavior, used to evaluate the possibility that human participants may have exhibited Performance without Awareness.

Increasing σ values resulted in increasingly poor R2 values (see Results), indicating that the ideal observer (which displays no performance without awareness) produces the best fit to human data.

DOI: http://dx.doi.org/10.7554/eLife.09651.009

Illustration of increasing values for σ on the appearance of Performance without Awareness behavior, used to evaluate the possibility that human participants may have exhibited Performance without Awareness.

Increasing σ values resulted in increasingly poor R2 values (see Results), indicating that the ideal observer (which displays no performance without awareness) produces the best fit to human data. DOI: http://dx.doi.org/10.7554/eLife.09651.009 Importantly, c – the contrast or evidence level along each axis that gave rise to the data sample the observer sees – is unknown to the observer. So, because contrast evidence is a secondary (or nuisance) variable to the primary variable of interest – in this case, the orientation of the Gabor patch – the observer ‘integrates out’ or marginalizes over all possible contrast evidence levels to produce the posterior probability estimate of each tilt (Yuille and Bülthoff, 1996). Thus, the joint probability of each orientation and contrast evidence level is estimated through Bayes’ rule and then the secondary variable is integrated out, leaving estimation of the posterior probability of each orientation S via the marginal distribution (Yuille & Bülthoff, 1996) In the simplest form, both orientations have equal prior probability of 0.5. The observer then makes its orientation decision (for each interval) via To determine which interval’s choice the observer is more confident in, the model refers to the magnitude of the posterior probabilities of each S in each interval as a measure of the probability of having made a correct orientation discrimination choice, i.e. p(correct) = p(S Then, the observer compares these posterior probabilities for the target-present (TP) and target-absent (TA) intervals by computing a decision variable D via The observer bets on the interval with the higher probability of being correct: if this decision variable D is greater than 0, the observer selects the target-present interval to ‘bet’ on; if it is less than 0, the observer selects the target-absent interval. Sample code for this Bayesian ideal observer is included in Source code 1.

Evaluation of model performance

We examined the relative agreement between our model’s predictions and collected behavioral data by calculating the multinomial likelihood of the model given the observed data, which has previously been used within a signal detection framework. Details of goodness of fit calculations are described in Appendix 4. To evaluate whether human participants exhibited Performance without Awareness, we needed to cause the model to also exhibit Performance without Awareness. We therefore degraded the 2IFC confidence judgment process in the following way: On each trial, after the orientation decision had been reached, we programmed an added decisional noise parameter, σ, such that the decision variable D calculated as in Equation 5 was corrupted by additive Gaussian noise with mean 0 and standard deviation σD, such that This causes the model to perform closer to chance at higher levels of orientation discrimination performance, i.e. to exhibit Performance without Awareness at increasing objective performance levels (Figure 5). We tested three decisional noise magnitudes – 0.1, 0.2, and 0.3 – and calculated the goodness of fit (see Appendix 4) for each σ for each subject.

Alternative models

We also examine three other possible contributing factors: correlated noise/non-orthogonal source distributions, signal-dependent (multiplicative), and signal-independent (additive) noise (see Appendix 4). These factors do not affect the qualitative trend of the model’s performance. For completeness, we also examine two other decision rules, detailed in Appendix 5: a heuristic observer which does not ignore contrast evidence as above, but explicitly estimates the most likely contrast level via hierarchical Bayesian inference (Yuille and Bülthoff, 1996); and a heuristic likelihood comparison observer (similar to Barthelmé et al., 2009). Importantly, the hierarchical model produced behavior similar to the ideal observer, indicating that such behavior is not idiosyncratic or specific only to the ideal observer presented above. The likelihood-only model, on the other hand, failed to produce predictions that matched collected behavioral data, either qualitatively or quantitatively.

Acknowledgements

This work was supported by the National Institute of Health (US) to HL (grant number R01NS088628). We thank Brian Maniscalco, Dobromir Rahnev, Hongjing Lu, and Zili Liu for helpful comments. In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included. Thank you for submitting your work entitled "Human observers have optimal introspective access to perceptual processes even for visually masked stimuli" for peer review at eLife. Your submission has been favorably evaluated by Timothy Behrens (Senior Editor) and three reviewers, one of whom, Matteo Carandini, is a member of our Board of Reviewing Editors. The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission. The following individuals responsible for the peer review of your submission have agreed to reveal their identity: Matteo Carandini (Reviewing Editor and Reviewer 3) and David Burr (Reviewer 1). Two other reviewers remain anonymous. This article has the potential to be important and well cited. Methodologically, it is one of the very best articles that tackle this question of "perception without awareness". Using a new objective technique by Pascal Mamassian the authors argue that there is no evidence for "blindsight" - above-chance discrimination without awareness - in typical healthy adults. The application of appropriate psychophysical methods/ Bayesian modelling is refreshing in a (sub-)field that has often relied on ad-hoc assumptions about how visual signals are encoded and converted to a decision. However, at present the quality of the data are not sufficient to allow the reader to draw unambiguous conclusions. For this paper to have the importance it deserves, it has to have better data. This data should be easy to obtain: it should be straightforward to run the tests on more subjects. Essential revisions: 1) The main finding rests on the result that participants wagered on the "signal present" interval above chance even when discrimination was at 51%. For this statement to be believable, one needs extra solid statistics. This should be achieved through truly independent samples, i.e. more subjects, not through resampling and statistical values reported to 4 digits of accuracy. Using 3 subjects may be the norm in psychophysics, but here the statistics are essential for the main message and for the claim to be believable one would want easily twice as many. These are easy and cheap measurements so it is not clear why one shouldn't expect a large number of subjects. As well as collecting a couple more subjects, one would also want individual data to be displayed, perhaps as a scatterplot of orientation vs confidence. At the moment, the variability in the current data is such that it's hard to tell whether or not there is a point where discrimination grows but confidence remains flat. Once new data are acquired, this should become a point that can be clearly judged by eye. 2) The paper first establishes by survey that most neuroscientists believe that perception can occur without awareness. This part of the study is amusing but it is unlikely to have lasting value. The figure can easily be replaced by words, and the whole thing dealt with in the Introduction as a motivator for the main part of the paper. 3) The paper should be made more interesting and understandable to general readers, avoiding or at least defining jargon (such as "Type 2 hit rate") and dropping the staid structure of psychophysics papers that divide results as a sequence of experiments. [Editors' note: further revisions were requested prior to acceptance, as described below.] Thank you for resubmitting your work entitled "Human observers have optimal introspective access to perceptual processes even for visually masked stimuli" for further consideration at eLife. Your revised article has been favorably evaluated by Timothy Behrens (Senior Editor) and a Reviewing Editor, Matteo Carandini. The manuscript has been markedly improved but there are some remaining minor issues that need to be addressed before acceptance. These minor issues all concern the text, which describes the material in a way that is sometimes rushed and garbled. They can all be solved with a simple round of editing. 1) A useful rule of thumb is to describe the figures one by one, without relying on people having to read captions or future parts of the paper or Methods. 2) It is premature to point to Figures 1 and 2 in Introduction. To read them and digest them is too much to ask to a reader who is still in Introduction, and has had no text to explain those two figures. That's what Results is for. 3) The Results section seems in a hurry to give take-home messages, without taking the time explain the figures. Please devote at least a paragraph to a description of Figure 1 (not zero words, as currently in Line 109 the first line of subsection “Behavioral Experiments”), guiding the reader through it. Similarly, please devote at least a paragraph to a description of Figure 2, and guide the reader through it. 4) Related to the previous points, please move much of the material from captions to main text. Captions should be used to explain what is in the graphs. They can also be used to described results and take-home messages if desired, but this is not a requirement, and that material should certainly also be in the main text. 5) It is not advisable to use main text to refer the reader to a figure caption (L119 last line of first paragraph subsection “Behavioral Experiments”), especially when that caption in turn contains another pointer, to another part of Results and to another caption (L152-153 Figure 2 legend). Please unravel all that material. 6) Figure 2 still has some jargon, e.g. the word "Metacognitive", which is not explained anywhere in the paper. Also, "Type 2 hit rate" remains mysterious to a non-expert reader. Is there a Type 1 hit rate? How about Type 3? What is a Type? Please define in text or do not use. 7) Subsection “Behavioral Experiments” abruptly refers to "Experiment 2" and "Experiment 1", as if the readers already knew that there are two experiments. Explain that there are two experiments, called "1" and "2", explain their differences in design, and then explain that you saw no difference in the results, pointing to appropriate panels in Figure 3. In fact, all this can be done after having described Figures 1 and 2: the logical place seems to be when describing Figure 3. 8) Perhaps consider whether it is really a good idea to anticipate the take-home message of the paper on the 8th line of Results (Line 115)? This adds to the feeling that the paper is rushing to give a result without really explaining much. Perhaps it is ok to do this after the paper has been edited and Figures 1 and 2 have been properly described. 9) This list of suggestions ends here but it would be good to look at the whole paper for clarity and readability. The authors may be too immersed in the results and in the prose to be the best judges of this, so asking a naive colleague might help. Essential revisions: 1) The main finding rests on the result that participants wagered on the "signal present" interval above chance even when discrimination was at 51%. For this statement to be believable, one needs extra solid statistics. This should be achieved through truly independent samples, i.e. more subjects, not through resampling and statistical values reported to 4 digits of accuracy. Using 3 subjects may be the norm in psychophysics, but here the statistics are essential for the main message and for the claim to be believable one would want easily twice as many. These are easy and cheap measurements so it is not clear why one shouldn't expect a large number of subjects. As well as collecting a couple more subjects, one would also want individual data to be displayed, perhaps as a scatterplot of orientation vs confidence. At the moment, the variability in the current data is such that it's hard to tell whether or not there is a point where discrimination grows but confidence remains flat. Once new data are acquired, this should become a point that can be clearly judged by eye. We agree. We have collected an additional 6 subjects’ worth of data, making 9 total subjects in Experiment 1; we opted to focus on Experiment 1, as Dr. Carandini (Reviewer 3) pointed out that Experiment 2 is essentially a replication and does not necessarily need its own section anyway. Instead of using bootstrapping and t-tests of the fitted psychometric functions (see also our response to Dr. Carandini’s comment about Figure 4), we rely on quantitative goodness of fit metrics between a version of the Bayesian observer modified to produce sub-optimal Performance without Awareness to demonstrate that the ideal observer (which of course does not produce Performance without Awareness) provides the best fit to the human data. We also overlay the group mean data on a scatterplot of individual subjects’ orientation percent correct versus betting on the target-present interval as suggested, to allow the reader to easily see that the pattern of responses in no way resembles Performance without Awareness. This is an excellent suggestion for visualization of the data that really drives home the message. We have also included the individual subjects’ data in Appendix 2, as before. 2) The paper first establishes by survey that most neuroscientists believe that perception can occur without awareness. This part of the study is amusing but it is unlikely to have lasting value. The figure can easily be replaced by words, and the whole thing dealt with in Introduction as a motivator for the main part of the paper. Agreed. We have done as suggested, and moved the details of the survey study and its results to Appendix 1. We now include the following text as a summary of the survey in the Introduction: “We conducted an informal survey to confirm this popular belief, which also revealed that convincing evidence for this phenomenon is believed to be lacking. We asked survey participants three key questions: (1) “Do you believe in subliminal perception?” (2) “Do you believe that the subjective threshold for awareness is above the objective discrimination threshold?” and (3) “If ‘yes’, do you believe this has been convincingly demonstrated in the literature?” Most respondents reported believing that subliminal processing exists (94%), but also that they did not believe it had been convincingly demonstrated in the literature (64%). These belief patterns were shown even among those who reported having published on subliminal or unconscious perception (94% and 61%, respectively). See Appendix 1 for full text of questions and detailed survey results.” 3) The paper should be made more interesting and understandable to general readers, avoiding or at least defining jargon (such as "Type 2 hit rate") and dropping the staid structure of psychophysics papers that divide results as a sequence of experiments. Also agreed. We have removed all but the most essential abbreviations throughout the text, opting instead to spell out the terms in words. We also now accompany any necessary jargon-y phrases (e.g. Type 2 hit rate) with definitions. To help the paper flow better, we have also combined the results from Experiments 1 and 2 into a single Behavioral Experiments results section, in response to the point that Experiment 2 is basically a replication of Experiment 1. The new text reads: “Because results are very similar across the two experiments, we combined results from both and performed a two-tailed one-sample t-test to assess whether this predicted percentage betting on the target-present interval significantly diverged from 75%. This analysis revealed that observers bet on the target-present interval significantly less than 75% of the time at 75% correct orientation discrimination accuracy (Figure 3A and B, Table 1). Thus, observers exhibited Performance > Awareness (see also Modeling Results, below).” We have also removed the long lists of statistics (as we changed the statistical tests used, see specific comments below), and replaced them with tables where appropriate. To clarify the section of text that contained the most jargon before, we have placed an additional heading of “2IFC detection?” in the results section, to help the reader understand what is being discussed. In this section, the modified text reads (in context): “To confirm that subjects were indeed rating confidence, we plotted Type 2 hit rate (placing a bet on a correct orientation discrimination decision) and Type 2 false alarm rate (placing a bet on an incorrect orientation discrimination decision) against orientation discrimination accuracy (Figure 3B and E). Subjects displayed increasing Type 2 hit rate as a function of orientation discrimination accuracy, whereas Type 2 false alarm rate remained relatively flat at around 50% (chance level) across increasing orientation discrimination accuracy.” In the Bayesian Ideal Observer Model section, we also have removed acronyms. The new text reads: “Crucially, however, the ideal observer does exhibit Performance > Awareness (Figure 3G), and to a similar extent as our human participants (R2 = 0.565; see Appendix 4 for details of goodness of fit metrics); trends for Type 2 hit and false alarm rates (Figure 3H), and percent correct conditional upon having bet on the target-present versus target-absent interval (Figure 3I), also match human data.” [Editors' note: further revisions were requested prior to acceptance, as described below.] 1) A useful rule of thumb is to describe the figures one by one, without relying on people having to read captions or future parts of the paper or Methods. Thank you for this suggestion. We have now moved the majority of information from the captions into the main text, and/or expanded the information previously contained in the captions in the main text as well. 2) It is premature to point to Agreed. We have now removed references to Figures 1 and 2 in the Introduction. 3) The Results section seems in a hurry to give take-home messages, without taking the time explain the figures. Please devote at least a paragraph to a description of Agreed. At the beginning of the Results section we have now explained the methods in more detail to give the reader some context, and also devoted a paragraph each to describing both Figure 1 and Figure 2. Here we also introduce the two experiments, since referring to Figure 1 means the reader will be introduced to there being two experiments at this juncture. The new text at the beginning of the Results section reads: “Nine human observers participated in two experiments of our 2IFC confidence-rating paradigm (Figure 1). In both experiments, participants viewed two intervals in which they were required to discriminate the orientation (right or left tilt) a Gabor patch target embedded in forward- and backward-masks (Figure 1A and B), and judged which of the discrimination choices they felt more confident in. Crucially, in one of the intervals the target was absent (Figure 1B), such that above-chance discrimination performance was impossible. We performed two experiments to assess the potential contributions of question order, receipt of feedback, and a priori knowledge of the presence of a target-absent interval (Figure 1C). In Experiment 1, participants judged which decision they felt more confident in and then indicated their orientation decisions for both intervals, while in Experiment 2 they indicated their orientation discrimination decisions before selecting the more-confident interval. In Experiment 2, we also provided feedback on the confidence decision, and told participants that one interval contained no target; this information was withheld from participants in Experiment 1. Stimuli, timing details, and order of question prompts in the two experiments are also discussed in greater detail in the Methods section. For both experiments, we evaluated whether participants exhibited Performance without Awareness (Figure 2A) or Performance > Awareness (Figure 2B). In both cases, the response pattern of interest can be visualized as percent of time betting on the target-present interval as a function of percent correct orientation discrimination in the target-present interval. ‘Performance without Awareness’ (Figure 2A) would be supported if observers can discriminate the target above chance (>50% accuracy) while being unable to bet on their choices more often than betting on the target-absent interval (which necessarily yields chance-level performance). That is, observers correctly discriminate the target’s orientation more than 50% of the time, but bet on the target-present interval 50% of the time (i.e., they bet randomly on the target-present versus target-absent interval), indicating they are not aware of the information that contributed to their discrimination decision. If this were to occur, it would most likely happen at low discrimination performance levels, yielding a pattern of behavior similar to that presented in Figure 2A. However, in psychophysics, thresholds can also be defined as midway between ceiling and floor performance (Macmillan & Creelman, 2004), such that threshold discrimination performance is defined as 75% accuracy rather than >50% (chance level). This concept can also be applied to subjective betting data in the sense that betting on the target-present interval could be considered “correct” or “advantageous” betting. In this sense (threshold = 75% correct performance), the subjective threshold for confidence might be above the objective threshold for discrimination. In other words, observers may bet on the target-present interval less often than they get the discrimination correct, but still above chance. This would occur because the orientation discrimination choice requires evaluation of only one interval (the one with the target in it) and therefore is subject to only one source of uncertainty, but the “betting” choice requires evaluation of both intervals, and therefore has two potential sources of uncertainty. This pattern of behavior (Figure 2B) may occur even if subjects do not display Performance without Awareness, and would be characterized by a pattern of responses that fall below the identity line (diagonal dashed line). We call this possibility ‘Performance > Awareness’.” 4) Related to the previous points, please move much of the material from captions to main text. Captions should be used to explain what is in the graphs. They can also be used to described results and take-home messages if desired, but this is not a requirement. Agreed. Done. 5) It is not advisable to use main text to refer the reader to a figure caption (last line of first paragraph subsection “Behavioral Experiments”), especially when that caption in turn contains another pointer, to another part of Results and to another caption ( Done. The new portion of the main text contains the information previously included in a caption with some additional explanation. Please see our response to Comment 3 for the new text. 6) We have significantly shortened the caption to Figure 2, moving most of it to the main text and expanding in greater detail. In the ‘2IFC detection?’ section, we have also now expanded the definition of Type 2 hits and false alarms in the main text, and contrasted them to Type 1 hits and false alarms according to standard signal detection theoretic definitions in order to make the definition clearer to the reader. The new text in context reads: “One possible concern is that subjects were not rating confidence but instead engaging in 2IFC detection of the target-present interval. To confirm that subjects were indeed rating confidence, we plotted Type 2 hit rate and Type 2 false alarm rate against orientation discrimination accuracy (Figure 3B and E). A Type 2 hit is defined as placing a bet on a correct orientation discrimination decision, whereas a Type 2 false alarm is defined as placing a bet on an incorrect orientation discrimination decision. These are in contrast to Type 1 hits and false alarms, which can be defined as saying “left” when a left-tilted Gabor was presented and saying “left” when a right-tilted Gabor was presented, according to standard signal detection theoretic definitions (Green & Swets, 1966; Macmillan & Creelman, 2004).” A brief explanation is also included in the caption for Figure 3, in case readers read the caption without reading the main text. The relevant portion of that caption reads: “Panels B, E, and H show rising Type 2 hit rate (‘HR’; when subjects bet on a correct orientation discrimination choice) but relatively flat Type 2 false alarm rate (‘FAR’; when subjects bet on an incorrect orientation discrimination choice), …” We have also added a definition of “metacognitive” to the ‘Unconscious “hunches”?’ section of the Results. That new text now reads: “However, one concern might be that subjects are able to meaningfully rate confidence despite no subjective visual experience of the stimulus due to some sort of non-visual “hunch” or “feeling.” Indeed, such metacognitive insights (the ability to introspectively distinguish between correct and incorrect responses) have recently been reported even in the absence of objective task performance sensitivity, although not in the context of perception (e.g., Scott, Dienes, Barrett, Bor, & Seth, 2014).” 7) Subsection “Behavioral Experiments” abruptly refers to "Experiment 2" and "Experiment 1", as if the readers already knew that there are two experiments. Explain that there are two experiments, called "1" and "2", explain their differences in design, and then explain that you saw no difference in the results, pointing to appropriate panels in As mentioned in our response to Comment 3, we have now included an expanded description of the two experiments at the beginning of the Results section. We believe the reference to the two experiments now makes sense, and when we discuss analyzing the two experiments together, we point the reader to the relevant panels of Figure 3. 8) Perhaps consider whether it is really a good idea to anticipate the take-home message of the paper on the 8th line of Results? This adds to the feeling that the paper is rushing to give a result without really explaining much. Perhaps it is ok to do this after the paper has been edited and As you anticipated, with the full descriptions of Figures 1 and 2 now in the main text, as well as a description of the differences between the two experiments, we believe this is okay at this point. 9) This list of suggestions ends here but it would be good to look at the whole paper for clarity and readability. The authors may be too immersed in the results and in the prose to be the best judges of this, so asking a naive colleague might help. We have now had a naive colleague read the paper and make suggestions throughout. We have implemented these suggestions to aid the readability of the paper to non-experts in the field. All changes are tracked. For example, in the Introduction we now unravel the terms “objective” and “subjective” a bit more. “In these demonstrations (e.g., blindsight: Weiskrantz, 1986) the subjective threshold for awareness (when a stimulus is consciously “seen”) seems well above the objective threshold for forced-choice discrimination (when a stimulus can be correctly identified): subjects can discriminate a target above chance performance, yet report no awareness of the target.” We also include a definition of 2-interval forced choice (2IFC) in the appropriate spot at the end of the Introduction: “Subjects discriminated two stimulus intervals, only one of which contained a target, and indicated confidence in their decisions using a 2-interval forced-choice procedure (2IFC), i.e. indicating which of the two discrimination decisions they felt more confident in.” And to give the reader a little more context for the paper, we include the following statement at the end of the Introduction: “Here, we explored whether such Performance without Awareness occurs in normal observers in two behavioral experiments, and compared these results to predictions of a Bayesian ideal observer.” These changes are accompanied by minor insertions and word substitutions throughout. We believe these changes improve the clarity of the manuscript in the manner suggested by the reviewer.

Appendix 2–Table 1.

Mean value of criterion for each subject, demonstrating that subjects did not display biases to say ‘left’ versus ‘right’ in the orientation discrimination task.

DOI: http://dx.doi.org/10.7554/eLife.09651.016

Experiment	Subject	c
1	1	0.047
	2	0.074
	3	-0.039
	4	-0.029
	5	-0.013
	6	-0.030
	7	0.0267
	8	-0.039
	9	0.003
2	1	0.229
	2	0.004
	3	0.035
Mean (σ)		0.022
t(11)		1.04
p		0.3215

32 in total

Review 1. Confidence and accuracy of near-threshold discrimination responses.

Authors: C Kunimoto; J Miller; H Pashler
Journal: Conscious Cogn Date: 2001-09

2. A signal detection theoretic approach for estimating metacognitive sensitivity from confidence ratings.

Authors: Brian Maniscalco; Hakwan Lau
Journal: Conscious Cogn Date: 2011-11-08

3. Unconscious perception: a model-based approach to method and evidence.

Authors: Michael Snodgrass; Edward Bernat; Howard Shevrin
Journal: Percept Psychophys Date: 2004-07

4. Relative blindsight in normal observers and the neural correlate of visual consciousness.

Authors: Hakwan C Lau; Richard E Passingham
Journal: Proc Natl Acad Sci U S A Date: 2006-11-21 Impact factor: 11.205

5. Effect of eccentricity on the relationship between detection and identification.

Authors: J P Thomas
Journal: J Opt Soc Am A Date: 1987-08 Impact factor: 2.129

6. Blindsight in normal observers.

Authors: F C Kolb; J Braun
Journal: Nature Date: 1995-09-28 Impact factor: 49.962

7. Discrimination at threshold: labelled detectors in human vision.

Authors: A B Watson; J G Robson
Journal: Vision Res Date: 1981 Impact factor: 1.886

Review 8. Evaluation of a 'bias-free' measure of awareness.

Authors: Simon Evans; Paul Azzopardi
Journal: Spat Vis Date: 2007

9. Evaluation of objective uncertainty in the visual system.

Authors: Simon Barthelmé; Pascal Mamassian
Journal: PLoS Comput Biol Date: 2009-09-11 Impact factor: 4.475

10. Brain-stimulation induced blindsight: unconscious vision or response bias?

Authors: David A Lloyd; Arman Abrahamyan; Justin A Harris
Journal: PLoS One Date: 2013-12-06 Impact factor: 3.240

27 in total

1. Oculomotor inhibition covaries with conscious detection.

Authors: Alex L White; Martin Rolfs
Journal: J Neurophysiol Date: 2016-07-06 Impact factor: 2.714

2. Dissociating conscious and unconscious influences on visual detection effects.

Authors: Timo Stein; Marius V Peelen
Journal: Nat Hum Behav Date: 2021-01-04

3. A higher-order theory of emotional consciousness.

Authors: Joseph E LeDoux; Richard Brown
Journal: Proc Natl Acad Sci U S A Date: 2017-02-15 Impact factor: 11.205

Review 4. A roadmap for the study of conscious audition and its neural basis.

Authors: Andrew R Dykstra; Peter A Cariani; Alexander Gutschalk
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2017-01-02 Impact factor: 6.237

5. Inflation versus filling-in: why we feel we see more than we actually do in peripheral vision.

Authors: Brian Odegaard; Min Yu Chang; Hakwan Lau; Sing-Hang Cheung
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2018-09-19 Impact factor: 6.237

6. Don't make me angry, you wouldn't like me when I'm angry: Volitional choices to act or inhibit are modulated by subliminal perception of emotional faces.

Authors: Jim Parkinson; Sarah Garfinkel; Hugo Critchley; Zoltan Dienes; Anil K Seth
Journal: Cogn Affect Behav Neurosci Date: 2017-04 Impact factor: 3.282

Review 7. Challenges for theories of consciousness: seeing or knowing, the missing ingredient and how to deal with panpsychism.

Authors: Victor A F Lamme
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2018-09-19 Impact factor: 6.237

8. Continuous flash suppression and monocular pattern masking impact subjective awareness similarly.

Authors: J D Knotts; Hakwan Lau; Megan A K Peters
Journal: Atten Percept Psychophys Date: 2018-11 Impact factor: 2.199

9. Lack of awareness despite complex visual processing: Evidence from event-related potentials in a case of selective metamorphopsia.

Authors: Teresa M Schubert; David Rothlein; Trevor Brothers; Emily L Coderre; Kerry Ledoux; Barry Gordon; Michael McCloskey
Journal: Proc Natl Acad Sci U S A Date: 2020-06-22 Impact factor: 11.205

10. Enhanced conscious processing and blindsight-like detection of fear-conditioned stimuli under continuous flash suppression.

Authors: Joana B Vieira; Sophia Wen; Lindsay D Oliver; Derek G V Mitchell
Journal: Exp Brain Res Date: 2017-08-16 Impact factor: 1.972