Literature DB >> 35930533

Enhancing performance with multisensory cues in a realistic target discrimination task.

Caterina Cinel¹, Jacobo Fernandez-Vargas¹, Christoph Tremmel^1,2, Luca Citi¹, Riccardo Poli¹.

Abstract

Making decisions is an important aspect of people's lives. Decisions can be highly critical in nature, with mistakes possibly resulting in extremely adverse consequences. Yet, such decisions have often to be made within a very short period of time and with limited information. This can result in decreased accuracy and efficiency. In this paper, we explore the possibility of increasing speed and accuracy of users engaged in the discrimination of realistic targets presented for a very short time, in the presence of unimodal or bimodal cues. More specifically, we present results from an experiment where users were asked to discriminate between targets rapidly appearing in an indoor environment. Unimodal (auditory) or bimodal (audio-visual) cues could shortly precede the target stimulus, warning the users about its location. Our findings show that, when used to facilitate perceptual decision under time pressure, and in condition of limited information in real-world scenarios, spoken cues can be effective in boosting performance (accuracy, reaction times or both), and even more so when presented in bimodal form. However, we also found that cue timing plays a critical role and, if the cue-stimulus interval is too short, cues may offer no advantage. In a post-hoc analysis of our data, we also show that congruency between the response location and both the target location and the cues, can interfere with the speed and accuracy in the task. These effects should be taken in consideration, particularly when investigating performance in realistic tasks.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35930533 PMCID： PMC9355224 DOI： 10.1371/journal.pone.0272320

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

Introduction

In everyday life we constantly make decisions. Some of those decisions are complex, requiring considerable amount of information and reflection, while others can be based on simple perceptual features, requiring for example rapid discrimination of external stimuli. In many cases, decisions can be highly critical in nature, with mistakes possibly resulting in extremely adverse consequences. Yet, often decisions have to be made quickly and with limited information. This is the case, for example, of situations where fast reactions to sudden stimuli are critical, for example, when driving or in a military context. This study was part of a US-UK collaboration in a Multidisciplinary University Research Initiatives (MURI) program, that aims at developing Brain-Computer Interfaces (BCIs) capable of enhancing decision-making performance in multisensory environments. In previous work, we have studied how, in those conditions, BCIs can improve accuracy of such decisions thereby acting as a form of cognitive enhancement [1-3]. In those studies, participants were presented with video streams of realistic scenarios (patrolling a dark corridor and manning an outpost at night) where users need to identify, as friends or foes, and as quickly and accurately as possible any unidentified uniformed soldiers that may present themselves [4, 5]. Within this context, given the adverse consequences associated with an incorrect or excessively slow identification, and given the ever increasing reliance on technology to assist soldiers, it seemed reasonable and ethical to attempt improve decision speed and accuracy by assisting the decision makers by providing them with additional information through verbal, auditory or audio-visual, communication from a remote human or AI assistant. To achieve this goal, key questions are, of course, what are the best timing and best perceptual modality (or modalities) to communicate the additional information, so that it can help rather than confuse, distract or delay the decision maker. As a first step in this direction, in this study, we explore whether perceptual decisions in a patrolling task can be enhanced with the use of communication cues informing the decision maker of the approximate location (hemifield) in which an imminent target stimulus (a soldier) is about to appear. To help simplify the analysis, we assumed that the advice is always correct and we also considered a simplified version of the corridor scenarios, where two static images were presented, instead of video streams. A preliminary analysis of this experiment was presented in a short conference paper in [6].

Background

In this section we review relevant literature and explain its links to our study.

Unimodal spatial cueing

In the literature, there is evidence showing that response times to a sudden visual stimulus can be accelerated when its spatial location is cued by a preceding stimulus (if cue and stimulus appear one after the other within a specific temporal window), compared to when no cue is presented. This has been extensively studied, particularly in simple tasks, using the classic spatial cueing paradigm [7-9]. Typically there is a differentiation between exogenous cues and endogenous cues. Exogenous cues are presented at the cued location and cause attention to automatically and involuntarily move to it before the task-relevant stimulus is presented. Endogenous cues are symbolic cues, such as an arrow pointing towards the direction where the stimulus will appear, or a word (e.g., “left” or “right”), providing spatial information without being in the cued location. These cause a voluntary and rapid shift of attention towards the referred location. Furthermore, in spatial cueing studies, typically a varying proportion of invalid cues (in which the target stimulus appears in the location that is not the one indicated by the cue, resulting in even worse performance than in the absence of a cue) are presented amongst valid cues. When using simple stimuli in classic cueing paradigms, with exogenous cues, the advantage in validly cued trials is typically observed with a cue-stimulus interval between 300 and 500 ms. Intervals longer than that tend to produce the opposite effect, with valid trials resulting in slower reaction RTs compared to neutral trials—an effect known as inhibition of return (IOR; [10]). However, whether or not this is the case for endogenous cues is more controversial. Spatial cueing has also been studied in realistic situations. For example, in driving simulations [11-13] or air traffic control simulations [14, 15] a spatial cue facilitates a prompter response to a potentially dangerous event, particularly in situations of high information load, where situation awareness can be particularly challenging. However, most of what is currently known of how spatial cueing can affect performance, and when and how it can be more effective, is the result of investigations carried out within a single modality—the visual perception domain.

Cross-modal and multimodal spatial cueing

Research on crossmodal spatial attention has shown that spatial cueing works even when cue and stimulus are from different modalities [16-18] and, in more realistic conditions. For instance, Begault [19] found that the performance in a visual search task of airline crews improved when using spatial auditory cues in the same location of the target, compared to when the cue was a warning message (“traffic, traffic!”) with no spatial information. More recently Ho and Spence [12] conducted a series of experiments investigating the benefits of using auditory spatial cues in a simulated driving task. They examined the effects of non-spatial non-predictive, spatial non-predictive, and spatially-predictive sound cues, as well as symbolic predictive and spatially-presented symbolic predictive verbal cues. In all cases, auditory cues coming from the visual target relevant direction improved the ability to detect those targets (see also other investigations on cross-modal links in attention, e.g., [11, 14]). Moreover, the advantage was enhanced when non-spatial, but semantically informative cues (the words “front” and “back”) were presented. Consistently with Ho and Spence’s findings [12], other studies [7, 12, 13, 20] have shown that spatial information cueing to the location or direction of a task-relevant visual stimulus can be provided not only by an exogenous cue but also by endogenous, meaningful directional cues—such as spoken words—presented in a non-relevant location (i.e., at the centre of the display, when the target-relevant visual stimulus is presented at left/right, top/bottom or front/rear locations).

Multisensory perception

Evidence from studies on multisensory perception points to an advantage in terms of efficiency and accuracy for multisensory stimuli—as compared to unisensory stimuli—and in terms of enhanced perception [21-26]. This is has to be expected, given that in nature most external events are multisensorial. This is also confirmed in all instances in which there is a push toward forming coherent percepts when there is multisensory incongruency [27-31]. The multisensory advantage has been observed at behavioural level, whereby, for example, reaction times can be faster and accuracy greater for multisensory stimuli, compared to corresponding unisensory stimuli (e.g., [32, 33]). This also seems to apply to spatial cueing, whereby bimodal cues can be more effective in attracting attention to a location as compared to informative unimodal cues [17, 34]. At neural level, the multisensory advantage has also been observed both in form of multisensory neurons in several parts of animal brains, as well as in form of direct connections between unisensory areas [23, 35]. Our cognitive system seems to assume that multisensory events, that are very close temporally and spatially, belong together [27, 29, 36] and both behavioural and neural responses to the environment can be enhanced when the information provided is multisensory [25, 37–39].

Temporal preparedness

In relation to the timing of stimulus presentation, it is well known that if a warning stimulus can be provided before a perceptual task (conveyed via a, so called, imperative stimulus) needs to be carried out, this may enhance the subject’s preparation, resulting in shorter reaction times and possibly also higher accuracy [40-42]. Typically, when the temporal difference between warning and imperative stimulus (which is called the foreperiod) varies randomly, the longer the foreperiod, the shorter the RTs. In some conditions also accuracies vary with the foreperiod, although these are typically very high as imperative stimuli persist during the response period and, so, variations are relatively small.

Implications for cueing in our target discrimination task

Most of the evidence reviewed above suggests that, when using cues to enhance performance in perceptual decision-making tasks, such as the one considered in our experiment, bimodal cues might be more effective than unisensory cues. However, this needed to be tested as there are several important differences between the conditions studied in the literature and our real-world setting. We consider them below. In relation to the spatial cueing literature, in our experiment we only have valid cues, i.e., the information provided by the verbal communication is always correct. Naturally, this is only an approximation as, in the real world, one cannot always guarantee that the additional information provided to a decision maker will be 100% accurate. Also, our stimuli were much more complex and realistic than the simple stimuli used in classic cueing paradigms and, as we indicated above, cues were endogenous. For these reasons, it seemed unlikely that the short cue-stimulus intervals used in exogenous cues (300–500ms) would be maximally advantageous and, thus, we decided to explore longer cue-stimulus intervals (500, 700 and 900 ms). Because in our experiment participants knew that the target stimulus would always appear at the cued location, we did not expect to find IOR. Also, while the cueing literature suggests that there is an advantage of providing cues and target stimuli in different sensory modalities, it is not clear whether providing bimodal cues might further enhance performance. For instance, would an audio-visual cue be more effective at moving attention to the location of an imminent visual stimulus as compared to an audio-only cue? This is particularly important in the context of our study. Finally, the temporal preparation studies mentioned above suggest the possibility that temporal preparedness to the main task may also have an effect on performance (particularly, response times). In such research, warning stimuli are typically uninformative in relation to the imperative stimulus, while, of course, spatial cues are informative, particularly when they are always valid. Also, in the literature typically decision tasks associated with the imperative stimulus are much simpler (e.g., the stimulus persists during the decision, while in our study targets only appear for a very short time) and, so, accuracy is much higher (e.g., 98-99%) than those in our experiment (approximately 85%). Nonetheless, there is the potential for cues to influence the temporal preparedness, possibly contributing to improving performance. As we will discuss later, in some of the conditions of our experiment, cues are non-informative, so this allows us to explore to some degree whether there is a measurable foreperiod effect in our experiment.

Additional potential effects in our experiment

In our experiment, the target stimulus was presented either on the right or left side of the screen, and participants were to respond using the left and right mouse button. Also, in part of the trials, a spoken cue saying the word “left” or “right” indicated where the stimulus was going to appear. Therefore, a further aspect that needed to be examined was whether performance was also affected by: a) a possible interference resulting from stimuli location (left of right of the display) and the response position (left or right mouse click), known as the Simon effect [43, 44], and b) the interference that might originate between the response location and the spoken cue (e.g., “left” vs. “right”), which is known as the spatial Stroop effect [45]. In this case, a semantic interference may arise when the spoken cue-word is incongruent with the location of the left and right arrow keys or the left and right buttons of a mouse. We should note that the Simon effect has been shown to be determined by the relative location of a stimulus, rather than absolute location in the visual field [45, 46]. Such effects are an important aspect that can have practical implications in realistic environments, where often tasks are based on spatial features and manual responses. We will study this in a post-hoc analysis.

Materials and methods

In this study, we performed an experiment where participants had to rapidly decide whether a uniformed character, appearing suddenly and for a very short time, on the left or right side of a poorly-lit realistic corridor, was wearing a cap or a helmet (see Fig 1). The experiment was divided into blocks. In some of the blocks, the stimulus display was preceded by a cue indicating whether the character/soldier would appear on the left or the right. The cue was either as a spoken word (“left” or “right”) only, or in the form of a synthetic face uttering the cue word (with matching lip movements). Performance was then measured in terms of both accuracy and reaction times. More details are provided below.

Fig 1

a) The sequence of displays on each trial and their duration. The SOA varied randomly between 500, 700 and 900 ms, while the response display lasted until a response was given; b) The two types of target characters appearing in the stimulus display (helmet or cap); c) In each block, one of the four types of pre-stimulus displays was shown: No Cue, Visual Cue, Auditory Cue or Audio-Visual Cue.

Stimuli and procedure

Participants were seated at a distance of about 80 cm from the computer monitor, where the stimuli were presented. They were presented with displays (in full screen) showing an image of a long corridor with doors on either side, and where a uniformed character appeared (in a posture suggesting the character is walking across the corridor), for 250 ms (see Fig 1). Participants had to decide, as rapidly as possible, whether the character was wearing a helmet or a cap (see Fig 1(b)). The corridor was always shown centrally on the display, so that the two side walls of the corridor were always symmetric with respect to the screen middle line (see Fig 1). The character could appear at different positions in the corridor, either horizontally (i.e., more or less lateral w.r.t. the centre of the display) or vertically (i.e., as if further away or nearer to the observer). The size of the character varied between 3.41° and 5.37° in height and 4.17° and 1.27° horizontally (reflecting the varying distances alongside the corridor from the observer). The horizontal distance of the character from the centre of the corridor varied between 1.54° (nearer to the centre of the corridor) and 16.22° (nearer to the left or right wall). So, it was never exactly at the centre. Each trial started with a display showing a fixation cross for one second. This was followed by a pre-stimulus display representing the corridor. In some of the trials a cue was also presented together with the pre-stimulus display. When a cue was presented, this was either an Auditory Cue (AC) or an Audio-Visual Cue (AVC). The AC consisted of a voice uttering either the word “left” or the word “right”, to indicate the side of the corridor where the character would appear (with 100% validity) in the following display. The spoken cue was played by a loudspeaker positioned centrally, behind the monitor where the visual stimuli were presented. The AVC consisted on a face, positioned at the top centre of the display (as shown in Fig 1(c)), uttering either the word cue “left” or the word “right”, with sound; note that the lip movements of the face pronouncing the spoken cue was only an approximation of the correct lip movements, and, so, on their own they were insufficient to understand what word was being pronounced. In addition, in some of the trials a static face was presented concurrently with the pre-stimulus display. The face was exactly the same as in the AVC condition, except that no lip movement nor sound was present. For simplicity, we called this Visual Cue (VC) (see Fig 1(c)), even though it did not act as a cue, i.e., it did not provide any information. We included VCs in the experiment to verify that the face used in AVCs, on its own, would not facilitate the task. Finally, in No Cue (NC) trials (see Fig 1(c)), only the pre-stimulus display (empty corridor) was shown. The experiment consisted of 16 blocks of 48 trials each. In each block, the type of pre-stimulus display was fixed, with four blocks for each pre-stimulus display type. The block order was randomised for each participant, and each block was preceded by a display informing the participant about the type of cue in that block. At the end of each block, the mean accuracy for that block was displayed as a form of feedback. Reaction Times (RTs) and accuracy were recorded. The time interval between the pre-stimulus display and the target stimulus—also known as Stimulus Onset Asynchrony (SOA)—was varied randomly within each block between 500, 700 or 900 ms.

Participants and ethical approval

Thirty-three participants in total took part in the experiment. However, data from participants whose accuracy in the task was below 60% (three participants) and participants who did not complete the experiment (two participants) were discarded. Therefore, here we present data from 28 participants in total. All participants had normal or corrected-to-normal vision. Participants signed a written informed consent form before taking part in the study. The research received ethical approval by the UK Ministry of Defence Research Ethics Committee and the University of Essex in June 2017 (Application Number: 832/MoDREC/17). All experiments were performed in accordance with the relevant guidelines and regulations.

Rejection criteria and statistical analyses

Reaction times longer than 1.5 second were considered outliers and discarded (on average 4% of trials were removed). In addition, as is typical in cognitive experiments, incorrect responses were also discarded from RT analyses. Data were analysed with within-subjects ANOVAs and paired t-tests, with the Benjamini-Hochberg correction for multiple tests. Effect sizes were measured using the partial η2.

Results

Effects of auditory cues and audio-visual cues

In this section the following two hypotheses were tested: a) That responses in trials with the spoken cue (both AC and AVC trials) are faster and more accurate than responses in trials with no spoken cue (NC and VC trials), and b) that responses in AVC trials are faster and more accurate than responses in AC trials. We had no specific hypotheses regarding any differences between NC and VC trials, and no specific hypothesis regarding the effect of the SOA interval, which was investigated to find the optimal SOA interval for boosting performance in realistic environments.

Reaction times

Mean RTs, according to SOA and pre-stimulus type, are displayed in Fig 2. The mean RT across all conditions was 489 ms (SD 107 ms).

Fig 2

Mean RTs with standard errors, according to SOA (500, 700 and 900 ms) and pre-stimulus type (NC, VC, AC and AVC).

A 3 × 4 within-subjects ANOVA showed that there was a main effect of SOA on RT (F(2, 27) = 13.47, p <.001, partial η2 = .33), indicating that RTs were slower with shorter SOAs (see “Reaction Times” column in Table 1).

Table 1

	Reaction Times	Accuracy
500ms	0.502 (0.109)	0.855 (0.061)
700ms	0.482 (0.110)	0.870 (0.052)
900ms	0.481 (0.106)	0.872 (0.056)
	p-values
500ms vs. 700ms	<0.001	0.041
500ms vs. 900ms	<0.001	0.017
700ms vs. 900ms	0.473	0.315

Top half: RTs (in seconds) and accuracy (proportions of correct responses), with standard deviations, in the different SOA conditions, regardless of pre-stimulus type. Bottom half: p-values of pairwise comparisons (one-tail t-tests, Benjamini-Hochberg adjusted), with statistically significant values in bold (α = .05). Pairwise comparisons of all SOA pairs (regardless of pre-stimulus condition) showed that RTs in the 500 ms SOA were the slowest (p <.001 vs 700 ms SOA, and p <.001 vs 900 ms SOA), while the difference in RTs between the 700 ms and 900 ms SOAs was non-significant (p = .473). The ANOVA showed that there was no significant main effect of pre-stimulus type on RT (F(3, 27) = 2.04, p = .114, partial η2 = .07), and, accordingly, pairwise comparisons of all pre-stimulus conditions (see “Reaction Times” column in Table 2) confirmed no significant results (all p > 0.1).

Table 2

	Reaction Times	Accuracy
AC	0.489 (0.112)	0.871 (0.057)
AVC	0.476 (0.111)	0.885 (0.051)
NC	0.490 (0.120)	0.862 (0.069)
VC	0.500 (0.104)	0.846 (0.060)
	p-values
VC vs. NC	0.892	0.955
AC vs. NC	0.535	0.157
AC vs. VC	0.165	0.007
AVC vs. NC	0.165	0.020
AVC vs. VC	0.165	0.007
AVC vs. AC	0.165	0.044

Top half: RTs (in seconds) and accuracy (proportions of correct responses), with standard deviations, in the different pre-stimulus conditions, regardless of SOA. Bottom half: p-values of pairwise comparisons (one-tail t-tests, Benjamini-Hochberg adjusted), with statistically significant values in bold (α = .05). There was, however, a significant interaction between pre-stimulus and SOA (F(6, 27) = 5.038, p <.001, partial η2 = .16). We, therefore, performed pairwise comparisons (one-tail t-tests) of pre-stimulus within each SOA, to test whether: a) the spoken cue in AC and AVC trials would give a respective advantage over the NC and VC conditions (the spoken cue was 100% predictive); b) performance in AVC trials would be better than performance in AC trials. The corresponding p-values are shown in Table 3. This shows that there is a significant difference for the 500 ms SOA condition, where in AC trials responses were slower than in AVC trials. Statistically significant differences were also found within the 900 ms SOA. Here RTs in both AC and AVC trials were significantly faster than in VC trials, and RTs in AVC trials were also significantly faster in NC trials. All differences within the 700 ms SOA were not statistically significant.

Table 3

Benjamini-Hochberg adjusted p-values from paired one-sided t-tests for pre-stimulus differences within each SOA, for reaction times (RTs) and accuracy (α = 0.05).

Statistically significant values are in bold.

Reaction Times	500ms	700ms	900ms
VC < NC	0.606	0.930	0.894
AC < NC	0.951	0.653	0.086
AC < VC	0.951	0.164	0.010
AVC < NC	0.606	0.418	0.043
AVC < VC	0.606	0.164	0.010
AVC < AC	0.025	0.418	0.375
Accuracy	500ms	700ms	900ms
VC > NC	0.911	0.970	0.899
AC > NC	0.600	0.348	0.151
AC > VC	0.129	0.024	0.012
AVC > NC	0.255	0.041	0.899
AVC > VC	0.091	<0.001	0.720
AVC > AC	0.208	0.053	0.989

Benjamini-Hochberg adjusted p-values from paired one-sided t-tests for pre-stimulus differences within each SOA, for reaction times (RTs) and accuracy (α = 0.05).

Statistically significant values are in bold. Finally, we compared performance in each pre-stimulus condition between the different SOAs. As shown in Table 4, for RTs, in cued conditions, statistically significant differences exist for both the AC and AVC conditions between the 500 ms SOA and the two longer SOAs; additionally, RTs were significantly faster in AC trials with a 900 ms SOA than with a 700 ms SOA. On the contrary, in the two non-cued/control conditions, NC and VC, only one statistical difference was identified: namely, for NC, between the 500 ms and 700 ms SOAs.

Table 4

Benjamini-Hochberg adjusted p-values from paired one-sided t-tests for differences between SOAs (all pairings) within each cue type, for reaction times (RTs) and accuracy (α = 0.05).

Statistically significant values are in bold.

RTs	NC	VC	AC	AVC
500 > 700	0.047	0.852	<0.001	0.011
500 > 900	0.222	0.852	<0.001	<0.001
700 > 900	0.838	0.852	0.048	0.180
Accuracy	NC	VC	AC	AVC
500 < 700	0.469	0.298	0.068	0.063
500 < 900	0.469	0.294	0.033	0.063
700 < 900	0.678	0.298	0.129	0.643

Benjamini-Hochberg adjusted p-values from paired one-sided t-tests for differences between SOAs (all pairings) within each cue type, for reaction times (RTs) and accuracy (α = 0.05).

Statistically significant values are in bold. We should note that RTs in the NC and VC conditions did not show the typical slope associated with the foreperiod effect. These two conditions are the most similar to those used in foreperiod studies as the corresponding pre-stimulus displays are non-informative and so can be thought of as warning stimuli. The absence of statistical evidence for a foreperiod effect may be due to additional resources required to parse the NC and VC displays (these displays have a much greater complexity than a typical warning stimuli) resulting in a reduced preparation for the imperative stimulus irrespective of SOA, or to the effect being relatively small compared with other effects.

Accuracy

The overall mean accuracy (across all participants and conditions) was 86.6% (SD = 5.4%). However, accuracy varied depending on SOA and pre-stimulus, as shown in Fig 3.

Fig 3

Mean accuracy (percentages) with standard error according to SOA (500, 700 and 900 ms) and pre-stimulus (NC, VC, AC and AVC).

A 3 × 4 within-subject ANOVA showed that there was a significant main effect of SOAs (F(2,27) = 4.43, p = .017, partial η2 = .14), with accuracy being lower with the 500 ms SOA (85.6%, SD = 6.1), followed by the 700 ms (87%, SD = 5.2) and higher with the 900 ms SOA (87.2%, SD = 5.6). We tested all pairings of the three SOAs, which showed a significant difference between the 500 ms and 700 ms SOAs (p = .041), and between the 500 ms and 900 ms SOAs (p = .017), but not between 700 ms and 900 ms SOAs (p = .315) (see Table 1). The ANOVA also showed a significant main effect of pre-stimulus (F(3,27) = 7.97, p <.001, partial η2 = .23), with AVC trials being the most accurate, followed by AC trials, NC trials and VC trials (see Table 2). Pairwise comparisons of the pre-stimulus variable, independently of SOA, showed that only the difference between AC and NC trials was non-significant (p = .131), while all differences between AC and VC (p <.001), AVC and NC (p = .017), AVC and VC (p <.001), AVC and AC (p = .037) were significant (see Table 2). The interaction between the pre-stimulus and SOA was non-significant (F <1, partial η2 = .02). In pairwise comparisons, where differences between pre-stimuli were tested separately for the three SOAs, the results were the following (see also Table 3). In the 500 ms SOA, all differences were non-significant (all p-values > 0.7). In the 700 ms SOA, accuracy was statistically greater in AC than VC trials (p = .029), while for AVC trials, accuracy was statistically greater than for NC trials (p = .034), VC trials (p = .0002) and AC trials (p = .044). Finally, with the 900 ms SOA the only significant difference was between AC and VC trials (p = .01). Finally, if we look at the differences in accuracy for different SOAs and pre-stimulus conditions in Table 4, we see that none of the non-cued (NC and VC) comparisons is statistically significant, while one comparison (AC 500 < 900) is statistically and three more are nearly statistically significant in the cued conditions, effectively confirming the corresponding RT results.

Overall performance

In general, the results presented above indicate that there is a beneficial effect in the cued (AV and AVC) vs non-cued (NC and VC) conditions, which is typically manifested in either RTs (SOA of 900 ms) or accuracy (at the 700 ms SOA). However, a joint improvement of RT and accuracy is present only in one case (see AC vs VC comparisons in Table 3). Therefore, formally, our first hypothesis, i.e., that responses in trials with the spoken cue would be faster and more accurate than responses in trials with no spoken cue, was not fully confirmed. However, what was confirmed is that, in most cases, the performance in trials with the spoken cue is better than in trials without it. Both accuracy and response time are important when making critical, time-sensitive decisions. While, ideally, one would want to improve both, the individual improvements seen in our experiment may still be quite valuable. We should note that this performance improvement is unlikely to be simply due to a foreperiod effect, for two reasons. Firstly, as we reported above, if a foreperiod was prominent in our experiment, it would have manifested itself first and foremost in the non-cued conditions (NC and VC), which is not the case. Secondly, in AC and AVC trials—unlike in NC and VC trials—the appearance of the pre-stimulus display is not a simple warning signal, as the cue is informative. At the presentation of the cue the participant needs to do three tasks: (1) interpretation of the auditory cue, (2) orienting the attention/gaze to the correct hemifield, and (3) temporal preparation to the main task (the main task being determining if the uniformed character is wearing a helmet or a cap). Only task (3) could contribute to what is normally described as the foreperiod effect and, yet, there seems to be no such an effect in then NC and VC conditions where only task (3) needs to be carried out. Therefore, the faster RTs with longer SOAs in AC and AVC are more likely a consequence of the fact that longer SOAs are needed for the interpretation of the endogenous cue. Interestingly, a bimodal effect, where the RTs for AVC were significantly faster than those for AV, was observed only at 500 ms SOAs (see Table 3). So, formally, our second hypothesis, i.e., that responses in AVC trials were faster and more accurate than responses in AC trials, was not fully confirmed. Upon reflection, in line with the literature on multisensory integration, one should not expect multisensory perception to provide advantages over unisensory perception if there is a low-level of noise (so, no need, for error correction) and enough time for making sense of the percepts is available. In our case, the spoken words (and also the moving lips) were noiseless. So, the most likely explanation for the superiority of AVC over AC only at the 500 ms SOA is the following: a 500 ms SOA does not always give enough time to process and react to the auditory cue in AC trials. So, the cue acts more like a distractor, resulting in the worst mean RT of all 12 pre-stimulus display and SOA pairs. In these more difficult conditions, seeing also lip movements (in the AVC condition) provides sufficient advantages to bring back RTs to the levels seen in the no-cue conditions. In more real-world conditions, where auditory cues would be much noisier (e.g., being conveyed via noisy radio communication to a soldier in a noisy environment), it is likely that bimodal perception would have helped also at longer SOAs. The combined effect of SOA and pre-stimulus on performance is well illustrated in Fig 4. This clearly shows not only that the audio-visual cues provide an advantage in both RTs and accuracy, but also that this effect is further enhanced by the longer SOAs. The criticality of the choice of timing is also illustrated by the AC at 500 ms SOA, in which a dramatic increase of the RTs is the result of the presentation of the cue too close to the target stimulus. The VC conditions were the least accurate and produced the slowest RTs. This is not surprising, as the visual cue alone does not give any additional information, unlike the AC and AVC trials, and it might actually act as a distractor, compared to the NC conditions.

Fig 4

Accuracy in relation to RTs for each of the twelve different conditions.

Congruency effects in our experiment

As explained in the Background section, it is possible that in this experiment there was interference when the response location (left and right mouse click) was incongruent with the target location (in all trials) and/or the spoken cue (in AC and AVC trials). This, in turn, could possibly modulate the facilitatory effect of the bimodal cue. If such an effect was present, we would expect comparatively slower RTs and/or lower accuracy when stimuli and response location were incongruent than when they were congruent [45]. In the experiment, the Simon effect might be observed on its own (in NC and VC trials), or in combination with the spatial Stroop effect (in AC and AVC trials), as the spoken cue was 100% predictive of the stimulus location. On the contrary, any spatial Stroop effect cannot be separated from the Simon effect. An example of the two types of effects is illustrated in Fig 5 for an AVC trial.

Fig 5

An example of Spatial Stroop and Simon effects in an “incongruent” AVC trial.

Note that in our experiments the stimulus location is irrelevant to the task, as the task is to decide whether the character was wearing a cap (right click) or a helmet (left click) irrespective of where it was located on the screen. In the following subsections, we report the results of a series of post-hoc analyses aimed at elucidating to what degree the Simon and spatial Stroop effects could modulate performance and the potential benefits of bimodal cues in our experiment.

Congruency between stimulus location and response location

In order to separate the effect of the character location from the effect of the spoken cue word (in the AC and AVC conditions), we first performed a statistical analysis where only NC and VC trials were included. As there is no effect of the spoken word, in NC and VC trials the only interference with the response location would originate from the location of the stimulus (Simon effect). Note that, we were not specifically interested in differences between NC and VC trials, but rather in differences between congruent and incongruent trials. A separate analysis for AC and AVC trials is presented in the next subsection. Mean RTs for NC and VC for different SOAs are shown on the left-hand side of Fig 6, for congruent and incongruent trials. Overall, RT averages for congruent and incongruent trials were quite similar: 494 and 497 ms, respectively. Indeed, a 2 × 3 × 2 ANOVA where the factors where Simon congruency (congruent and incongruent), SOA (500, 700 and 900 ms) and pre-stimulus type (NC and VC), showed that none of the main effects or interactions were significant (all p > 0.1).

Fig 6

Mean RTs with standard errors in congruent trials (blue) and incongruent trials (red), according to pre-stimulus display and SOA.

However, when analysing accuracy for congruent and incongruent trials for NC and VC (see left-hand side of Fig 7), a 2 × 3 × 2 ANOVA showed that there was a significant main effect of Simon congruency (F(1,27) = 6.227, p = .019, partial η2 = .18), where accuracy was 87% in congruent trials and 84% in incongruent trials. This is well illustrated in Fig 7, where all the blue markers (congruent trials) are distinctively separated from all red ones (incongruent trials). All the other main effects and interactions were non-significant (all p > 0.1).

Fig 7

Mean accuracy with standard errors in congruent trials (blue) and incongruent trials (red), according to pre-stimulus display and SOA.

The results of these analyses suggest that, while the Simon effect did not affect RTs, there was an effect on accuracy. The effect is clearly illustrated on the left of Fig 8, where most markers for congruent NC and VC trials are shifted-right (better accuracy) versions of the corresponding markers for incongruent trials.

Fig 8

Accuracy in relation to RTs for each of the twelve different conditions, for congruent (blue) and incongruent (red) trials.

Dashed arrows connect corresponding conditions in congruent and incongruent trials.

Accuracy in relation to RTs for each of the twelve different conditions, for congruent (blue) and incongruent (red) trials.

Dashed arrows connect corresponding conditions in congruent and incongruent trials. The lack of modulation in the response times is a partial inconsistency with the literature on the Simon effect (where differences in both accuracy and RTs are normally reported). This may be explained by the fact that, although the target was always in either the left or the right hemifield, the eccentricity of the target within the hemifield would vary, with some targets being very close to the center of the corridor (thereby possibly resulting in virtually no interference), while others more clearly on one side. It is possible that this variable eccentricity results in performance differences between congruent and incongruent trials appearing only in the accuracy and not the RTs.

Congruency between cue word and response location

In AC and AVC trials (where there was the spoken cue), there is another possible source of interference, affecting the RTs and accuracy: the incongruence between the spoken cue-word and the response location (spatial Stroop effect). To test the spatial Stroop effect, 2 × 3 × 2 ANOVAs on both RTs and accuracy were performed, for the AC and AVC trials, with congruency, SOA and pre-stimulus type as the three factors. The ANOVA analysis of the RTs (see also right hand side of Fig 6) showed that there was a significant main effect of Congruency (F(1,27) = 32.795, p <.001, partial η2 = .55), with RTs being faster in the congruent condition than in the incongruent condition (463 and 503 ms, respectively). There was also a main effect of SOA (F(2,27) = 26.852, p <.001, partial η2 = .50), 700 ms SOA, and RTs slower with the 700 ms SOA than with the 900 ms SOA (the mean RTs being 507, 476 and 466 ms, for the 500, 700 and 900 ms SOAs, respectively), which confirmed the finding from previous analyses. The main effect of pre-stimulus and all the interactions were non-significant (all p >.1). Similar results were obtained in an ANOVA analysis of the accuracy (see also right hand side of Fig 7), with the main effects all significant: (a) Congruency (F(1,27) = 16.018, p <.001, partial η2 = .37), with higher accuracy in the congruent condition than in the incongruent condition (89.9% and 85.8%, respectively); (b) SOA (F(2,27) = 6.003, p = .004, partial η2 = .18) with accuracies 86.3%, 88.4% and 88.9%, for 500 ms, 700 ms and 900 ms SOAs, respectively; and pre-stimulus (F(1,27) = 4.407, p = .045, partial η2 = .14), with accuracy being greater in AVC trials (89%) than in AC trials (87%). Pairwise comparisons for SOA showed a statistically significant difference between 500 and 700 ms (p = .013), and between 500 and 900 ms (p = .008). All interactions were non-significant (all p >.3). Therefore, there was strong evidence of a congruency effect between the spoken word and the response location on performance. The effect is clearly illustrated in the middle and right of Fig 8, where AC and AVC markers for congruent trials are effectively shifted-right (better accuracy) and down (faster RTs) versions of the corresponding markers for incongruent trials. However, as there was no significant interaction between congruency and pre-stimulus, we can conclude that the beneficial effect of the bimodal cue did not depend on whether or not there was congruency between the spatial features of cue and stimulus and the response location.

Discussion and conclusions

The main purpose of the study was to investigate the effectiveness of unimodal and bimodal cues at improving performance in a realistic decision-making task. In our experiment, the cues were presented to indicate the left or right location of the imminent target stimulus, and participants had to decide whether the target was wearing a helmet or a cap. The interval between cue and target (or SOA) could vary between 500 ms, 700 ms or 900 ms. Based on previous cuing research, we hypothesised that the auditory cue would give an advantage in the discrimination task, in terms of both RTs and accuracy. Additionally, based on previous multisensory literature showing that multisensory stimuli are in many cases processed faster and more accurately than unisensory stimuli, we expected that these auditory cue advantages would be enhanced when this was combined with a “visual cue”, which was in fact the face of a person uttering the cue words, resulting in a bimodal audio-visual cue. Our data confirmed that, in general, there was an advantage in providing a cue. However, unimodal (auditory) cues were only beneficial with an SOA of 900 ms for RTs and an SOA of 700 ms for accuracy. In fact, using an SOA of 500 ms with unimodal cues resulted in the worst mean RT of all pre-stimulus display and SOA pairs. We argued that this is because a 500 ms SOA does not give enough time to process and react to the auditory cue and, so, the cue can act like a distractor. We found, however, that bimodal cues (in the AVC condition) were more effective than unimodal cues for RTs at precisely this SOA (500 ms), for both RTs and accuracy, effectively bringing back the RTs to the levels seen in the no-cue conditions. This suggests that seeing lip movements provides a multisensory advantage which is sufficient to compensate the distracting effect of an incompletely processed auditory cue. This, of course, can only be confirmed by further investigations. Therefore, our two initial hypotheses, i.e., (1) that responses in trials with the spoken cue (both AC and AVC trials) are faster and more accurate than responses in trials with no spoken cue, and that (2) responses trials with a bimodal cue are faster and more accurate than responses in AC trials, were only partially confirmed. We did not have any specific hypothesis in relation to the cue-stimulus intervals, as there was no clear indication from previous research of what the ideal temporal window should be to maximise the effect of the cues. As discussed above, we found that RTs tended to be faster with longer intervals, particularly in cued trials, probably because 500 ms were not enough to process the spoken c‘ue. Our data also revealed a combined effect of SOA and pre-stimulus, as shown in Fig 8, showing a trend where the audio-visual cues provide an advantage in both RT and accuracy over audio only cue and no cue, which is further enhanced with longer SOAs. It is important to note that the face lip movements in bimodal cues were not understandable on their own (without the sound); however, they were slightly different for the two words and, therefore, after enough practice, participants could have implicitly learned to discriminate between the two, which might then have helped in the bimodal cues. And indeed a limitation of this study was the lack of a condition that could examine to what extent the lip movements helped to process the cue. Alternatively, another explanation as to how the bimodal effect might occur (and similar positive effect of multisensory stimuli reported in the literature), include the multisensory stimuli causing a “perceptual enhancement” [33]. In terms of practical lessons learnt from our study, in real-world scenarios, the person uttering the cue would not typically know how long after it the target would become visible. So, one could not choose beforehand the best modality of presentation, or, in fact, whether to withhold the cue to avoid distraction from/interference with the primary task. In this situation, our data show that one should always use bimodal cues, as they have proven to be the most resilient, never being worse, and in most cases being better (in terms of response speed, accuracy or both), than no-cue conditions. Naturally, future investigations will need to explore this aspect in more depth, and in particular, whether in more difficult conditions (e.g., in a noisy environment) bimodal cues can be even more effective at improving performance. Finally, in our study we performed post-hoc analyses to examine whether there were any interference from the Simon effect and the spatial Stroop effect. In particular, we wanted to clarify whether the inconsistent effect of bimodal cues on the response times and accuracy could be explained by an interference from the incongruency between the response location (in the Simon effect) and the target stimulus location or the spoken word (in the spatial Stroop effect), or both. We did not find evidence that the Simon and Stroop effect limited the effect of bimodal cues, and there was no evidence of a significant Simon effect on reaction times, but there was a considerable effect on accuracy. Moreover, we found that there was a significant spatial Stroop effect on both RTs and accuracy, whereby responses were faster and more accurate when the spoken cue (“left” or “right”) was congruent with the response location (left click or right click). This is an important aspect and it needs to be taken into account as it can have practical implications in real-life tasks, which are often based on spatial features and manual responses. To conclude, our findings support the idea that, when used to facilitate perceptual decision under time pressure, and in condition of limited information in real-world scenarios, spoken cues can be effective in boosting performance (accuracy, reaction times or both), and even more so when presented in bimodal form. However, as seen in our scenario, cue timing plays a critical role and, if SOAs are too short (cf. 500 ms) cues may offer no advantage. While this work didn’t have the specific objective to elucidate the precise mechanisms behind any performance improvements provided by unimodal or bimodal cues in our experiment, it suggested a number of avenues for both real-world applications and theoretical investigation. For example, a new experiment should include a control condition with moving lips, but without spoken cue, and further research is needed to clarify to what degree noisy conditions change the effect of unimodal and bimodal cues. More in general, further experiments are needed to investigate whether the advantage of presenting multisensory cues extends to other contexts and different types of stimuli and non-spatial cues. Finally, more research is needed on the influence of the Simon and spatial Stroup effects in real-world situations and how these can be reduced using alternative interaction modalities. 18 Jan 2022

PONE-D-21-38703

Enhancing performance with multisensory cues in a realistic target discrimination task

PLOS ONE Dear Dr. Cinel, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

I found one expert reviewer to comment on your work. In order to ensure a timely review process and to give you a more precise guidance of how to potentially improve the manuscript, I decided to take the role of reviewing myself. Detailed suggestions can be found at the bottom of this letter. The referee considers your work important overall, yet there are some methodological issues that clearly prevent publication in its present form. This concerns the organisation of the manuscript, in particular the delineation of hypotheses, and the way the information is presented in the method section. I would invite you preparing a revision of your work that addresses all concerns together with a cover letter that provides point-by-point replies.

Please submit your revised manuscript by Mar 05 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Michael B. Steinborn, PhD Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf. 2.Thank you for stating the following in the Acknowledgments Section of your manuscript: [ This article is an overview of UK MOD sponsored research and is released for informational purposes only. The contents of this article should not be interpreted as representing the views of the UK MOD, nor should it be assumed that they reflect any current or future UK MOD policy. The information contained in this article cannot supersede any statutory or contractual requirements or liabilities and is offered without prejudice or commitment. The authors acknowledge support of the UK Defence Science and Technology Laboratory (Dstl) and Engineering and Physical Research Council (EPSRC) under grant EP/P009204/1. This is part of the collaboration between US DOD, UK MOD and UK EPSRC under the Multidisciplinary University Research Initiative.] We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: [The work was funded by UK Defence Science and Technology Laboratory (Dstl) and Engineering and Physical Research Council (EPSRC) under grant EP/P009204/1 RP and LC received the award. https://epsrc.ukri.org/ The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.] Please include your amended statements within your cover letter; we will change the online submission form on your behalf. 3. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide. 4. We noticed you have some minor occurrence of overlapping text with the following previous publication(s), which needs to be addressed: - https://ieeexplore.ieee.org/document/9215988 In your revision ensure you cite all your sources (including your own works), and quote or rephrase any duplicated text outside the methods section. Further consideration is dependent on these concerns being addressed. Editor Comments (-1-) theory and hypotheses

I would completely agree with R1 that your theorising on the effects of multisensory cues is a bit confusing and so are the predictions derived from your theorising. I would suggest specifying what underlying process is assumed in your situation and what process model is assumed (or could be adopted) to predict performance effects on reaction time and accuracy. At present, you are debating general processes related to multisensory processing but it seems to fit not exactly to what your are examining empirically.

(-2-) downward-sloping SOA function

More specifically, it seems you are predicting a downward-sloping foreperiod function (the variable foreperiod effect, see doi:10.1037/xhp0000561; doi:10.1016/j.actpsy.2008.08.005), which you attribute to mechanisms of inhibition of return, which is incorrect in the present situation. I suggest reconsidering your theorising in the revised version of the manuscript.

(-3-) inhibition of return

The theorising on inhibition of return (IOR) is incorrect throughout the manuscript. Note that IOR is not an effect related to long intervals per se. Inhibition of return refers to the the way organisms gather information from visual systems by looking. For example, in a saccadic walk during the perception of a visual scene, the information processing system tends "not" to jump back to a previous saccadic point because it would prevent the system from gathering the information efficiently. It makes no sense to confuse the foreperiod effect with IOR.

[Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions

Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: No ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This study investigated how the efficiency of target discrimination changes depending on whether the pre-cue stimulus, which indicates the location of target presentation, was multisensory or unisensory. The main result is that multisensory cueing can be more beneficial for target discrimination than unisensory cueing. The authors concluded that processing more than one modality increases the benefit of the cue by some processes. The paper addressed an interesting topic, and the results have the potential to be worthwhile to publish. However, I have some points that should be addressed prior to publication. I list these in my detailed review below. Particular care should be given to point 1, which concerns that it is not clear how the experimental situation in this study can be considered in the context of the previous studies, and this could pose problems in the overall interpretation of the results. Point 7 may help address this point. 1. The explanation in 61-82 needs to be more clarified in light of the authors' experimental situation. How does multisensory cueing increase the effectiveness of cues in directing spatial attention? The authors cited some citations, such as [37], that show multisensory cues guide attention better than unisensory cues. From this, the authors seem to hypothesize that multisensory cues work better than unisensory cues in their experiments. However, the literature cited here may have examined the effects of combining unisensory cues that function on their own. If so, then there is a problem with the connection between the previous studies and this hypothesis. The reason is that the visual cue in this study did not work by itself, as this is mentioned after the results section. Additionally, the auditory cue was not very effective on its own. This point is important because it can critically affect the reason for conducting each analysis and the overall interpretation of the results. Is there a different logic from the one I described? 2. In l.101, does the “task-relevant stimulus” refer only to the target or both the target and auditory cue? Is the auditory cue be heard from the spatially center, left, or right side of the screen? It would be better to explicitly state whether the auditory cue guides exogenous attention (e.g., a sound physically presented from the left or right or endogenous attention (e.g., a centrally presented semantic stimulus) to avoid misunderstanding by the reader. 3. Is VC the motion of the face with the lips uttering the word indicating “left” or “right” in the center of the screen as a visual cue (VC)? If my understanding is correct, this seems to be a subtle stimulus that may or may not work as a “cue.” If the participants cannot instantly understand the meaning of the VC, it will not work on its own to guide attention. Is there any data that investigates this point? At least, it is necessary to describe from the subjective point of view whether the VC is a stimulus that can be understood immediately. From the description after the results, it appears that VCs do not work by themselves, but it would be easier to understand if they were mentioned in the introduction or in the methods section. 4. The experimental procedure section should contain details about the stimuli to be presented. For example, how many centimeters away from the participant were the images presented, and how large were they? Also, how many visual angles between the center and the cues or targets that appeared? Without these details, the reader may be misled. In my case, I first thought that the positions of the targets presented on the left and right sides were fixed, but later I realized that there was some variation in the Results section. Note that, as mentioned in point 2, the location of the source of the auditory stimuli should also be described. 5. A more systematic arrangement of figures and tables throughout the Results section might be needed. 6. The Authors should provide effect sizes for statistical tests. Also, it would be easier to understand the data if variability information is added to all the graphs. 7. In addition to the Stroop effect and the Simon effect, there is another effect that could interfere with the effect of cueing in this study: the foreperiod effect. It has been widely pointed out in the research field of the cognitive function called temporal preparation that, in an experimental situation where the target is presented after different foreperiods, the degree of readiness for the target is low at the beginning of the trial, but increases over time. The results of this study might be explained in terms of the combined effect of temporal preparation and the difference in the time it takes to process each cue. If possible, I would like to see some analysis or discussion that can shed light on this perspective. This interpretation seems to be more valid than citing examples where spatial attention is efficiently guided by the integration of multiple unisensory spatial cues which are valid on themselves. This is because neither the auditory nor visual unisensory cues did not have a major effect on the efficiency of target discrimination on their own. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 21 Apr 2022 Please see Response to Reviewers document. Submitted filename: Response to Reviewers.pdf Click here for additional data file. 26 May 2022

PONE-D-21-38703R1

Enhancing performance with multisensory cues in a realistic target discrimination task

Editor comments. The referee commented on your work and found most of the issues properly addressed, despite some remaining points. I also read the paper myself, and in agreement with the referee, I think it is close to the acceptance stage. I would ask you to address the remaining points before the manuscript will be officially accepted. This means, there will be no further reviewer round and the final revision serves only to give the manuscript the proper fine-tuning. Please submit your revised manuscript by Jul 10 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Michael B. Steinborn, PhD Section Editor PLOS ONE Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Thanks to the Authors for responding to my comments. The revised manuscript addressed my points adequately. I now understand that the authors' purpose is to provide a practical guideline, not to reveal multisensory cueing mechanisms theoretically. I have two final points that should be addressed: I think the difficulty in discussing the results is derived from the fact that each cue may have a different length of interpretation time (see lines 433-436 in the marked-up version of the manuscript). We agree that the temporal preparedness with the definitions used by the authors might not affect the results. On the other hand, I think it is worth noting that different interpretation times for each cue affected the results, as described in lines 603-612 in the marked-up version of the manuscript. In such a case, I do not think that the results comparing the effect of cue type on performance within each SOA would have allowed the authors to compare the effect of cues eliminating differences in SOA in terms of the psychological factors. Although the authors appeared not to explicitly state that this analysis was to remove the effect of SOA theoretically, readers might interpret it that way. Suppose the authors note that examining the effects of cues within each SOA is only meant to provide a practical guideline for the authors' purposes, not to remove the effects of SOA theoretically. In that case, the use of these statistical analyses might be acceptable. The error bars of different colors in Figures 6 and 7 are covered, and some are not visible. Therefore, it is recommended that the error bars be separately presented, as shown in Figures 2 and 3. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

9 Jul 2022 See Response to Reviewers document. 18 Jul 2022 Enhancing performance with multisensory cues in a realistic target discrimination task PONE-D-21-38703R2 Dear Dr. Cinel, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Michael B. Steinborn, PhD Section Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: 28 Jul 2022 PONE-D-21-38703R2 Enhancing performance with multisensory cues in a realistic target discrimination task Dear Dr. Cinel: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Michael B. Steinborn Section Editor PLOS ONE

37 in total

Review 1. Multisensory perception: beyond modularity and convergence.

Authors: J Driver; C Spence
Journal: Curr Biol Date: 2000-10-19 Impact factor: 10.834

2. Cross-modal illusory conjunctions between vision and touch.

Authors: Caterina Cinel; Glyn W Humphreys; Riccardo Poli
Journal: J Exp Psychol Hum Percept Perform Date: 2002-10 Impact factor: 3.332

Review 3. Crossmodal spatial attention.

Authors: Charles Spence
Journal: Ann N Y Acad Sci Date: 2010-03 Impact factor: 5.691