Literature DB >> 35468160

Effects of degraded speech processing and binaural unmasking investigated using functional near-infrared spectroscopy (fNIRS).

Xin Zhou¹, Gabriel S Sobczak², Colette M McKay^3,4, Ruth Y Litovsky^1,5,6.

Abstract

The present study aimed to investigate the effects of degraded speech perception and binaural unmasking using functional near-infrared spectroscopy (fNIRS). Normal hearing listeners were tested when attending to unprocessed or vocoded speech, presented to the left ear at two speech-to-noise ratios (SNRs). Additionally, by comparing monaural versus diotic masker noise, we measured binaural unmasking. Our primary research question was whether the prefrontal cortex and temporal cortex responded differently to varying listening configurations. Our a priori regions of interest (ROIs) were located at the left dorsolateral prefrontal cortex (DLPFC) and auditory cortex (AC). The left DLPFC has been reported to be involved in attentional processes when listening to degraded speech and in spatial hearing processing, while the AC has been reported to be sensitive to speech intelligibility. Comparisons of cortical activity between these two ROIs revealed significantly different fNIRS response patterns. Further, we showed a significant and positive correlation between self-reported task difficulty levels and fNIRS responses in the DLPFC, with a negative but non-significant correlation for the left AC, suggesting that the two ROIs played different roles in effortful speech perception. Our secondary question was whether activity within three sub-regions of the lateral PFC (LPFC) including the DLPFC was differentially affected by varying speech-noise configurations. We found significant effects of spectral degradation and SNR, and significant differences in fNIRS response amplitudes between the three regions, but no significant interaction between ROI and speech type, or between ROI and SNR. When attending to speech with monaural and diotic noises, participants reported the latter conditions being easier; however, no significant main effect of masker condition on cortical activity was observed. For cortical responses in the LPFC, a significant interaction between SNR and masker condition was observed. These findings suggest that binaural unmasking affects cortical activity through improving speech reception threshold in noise, rather than by reducing effort exerted.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35468160 PMCID： PMC9037936 DOI： 10.1371/journal.pone.0267588

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

Introduction

Listening to speech can be challenging in many situations, for instance, when communicating in a “cocktail party” scenario whereby the listener is surrounded by multiple conversations [1]. Limitations also arise when individuals with hearing loss need to extract information from acoustically degraded speech [2]. Listeners may expend elevated cognitive resources in a challenging condition to retain accuracy in speech perception [3]. On a long-term basis, chronically-elevated cognitive resources while listening to sounds in the environment could result in fatigue, decreased quality of life and reduced work efficiency [4]. Effortful speech processing is closely tied to auditory cortex (AC) activity involved in sound perception, and to lateral prefrontal cortex (LPFC) activity associated with higher-level speech understanding. However, as reviewed below, studies to date report different sub-regions of the LPFC responding to degraded speech. Conflicting evidence also exists with regards to LPFC activity in response to different spatial configurations of noise relative to the target speech. The current study focused on binaural unmasking, whereby speech understanding can improve when target speech is spatially separated from masking noise compared to when the target and maskers are co-located [5-9]. We were primarily interested in whether functional near-infrared spectroscopy (fNIRS) could reveal differences in cortical activities between the left LPFC and AC in the context of binaural unmasking at different speech-to-noise ratios (SNRs). Our secondary question was whether there were functional differences across sub-regions within the LPFC in varying stimulus configurations.

fNIRS measures of cortical activity

The fNIRS imaging method uses near-infrared light that travels through the superficial cortical areas, with some of the light photons being absorbed by the chromophores and some being scattered [10]. By measuring changes in light intensity as a function of time, fNIRS reveals the concentration changes of oxygenated and deoxygenated hemoglobin, denoted as ΔHbO and ΔHbR respectively, in the local cortical area contained within the pathway of the infrared light. The concentration changes in hemoglobin (called hemodynamic responses) are thought to be closely related to the neuronal activity in the cerebral tissue through neural vascular coupling [11]. Previous studies have shown that fNIRS measures are closely related to the blood-oxygen-level-dependent (BOLD) signal from functional magnetic resonance imaging (fMRI), with strong correlations shown between BOLD signals and ΔHbO [12], between BOLD and ΔHbR [13, 14], and between BOLD signals and the total concentration changes in hemoglobin, i.e., ΔHbT [15]. The differences between ΔHbO and ΔHbR, i.e., ΔHbO—ΔHbR, which assess changes in cerebral oxygenation (ΔHbC), have also been used to reveal changes in neuronal activity in the prefrontal cortex [16-18]. One of the advantages of fNIRS over fMRI is the compatibility with ferromagnetic materials (e.g., metal implants), thus fNIRS is optimal for measuring cortical activity in populations where fMRI scanning is precluded [19]. Further, compared to the loud scanning noise of fMRI, fNIRS is silent and has been popular for research involving auditory stimuli [20-26].

Neuroimaging studies revealed evidence of binaural unmasking

The current study was designed to garner evidence for binaural unmasking based on cortical activity measured using fNIRS. To induce binaural unmasking, we presented monaural speech stimuli to the left ear with noise presented either monaural (co-located) or diotic (both ears with no interaural difference in time or intensity). These configurations were tested with speech that was either unprocessed or vocoded. When comparing the monaural and diotic noise conditions, improved performance in speech reception thresholds is indicative of binaural unmasking, which facilitates source segregation and improved speech intelligibility in noisy environments [5-9]. Neural representations of binaural unmasking have been recorded in the brainstem, mainly in the inferior colliculus [27, 28] and in the AC of mammals [29]. In humans, previous studies investigated the neural origins of binaural unmasking using techniques such as brainstem auditory evoked potentials, frequency-following responses, and auditory steady-state responses. However, these studies did not demonstrate evidence of binaural unmasking at the brainstem level [see ref. 30 for reviews]. The absence of observed subcortical contributions to binaural unmasking in humans could be due to the limited spatial resolution of the neuroimaging systems. It could also be due to some unknown mechanisms that are likely specific to humans and need further exploration. At the cortical level, while some studies demonstrated effects of binaural unmasking in auditory areas [31, 32], other studies did not [33, 34]. The inconsistencies of results could be due to differences between studies in the stimuli and task conditions that were used. Interestingly, neural correlates of binaural unmasking were reported [35], with greater N1 amplitude in the unmasked versus masked condition in NH adults and in adults with atrophy in the brainstem including the inferior colliculus. This result suggested that in humans, effects of binaural unmasking persist at the cortical level even with severe damage in the brainstem. A previous fMRI study [36] investigated the neural correlates of binaural unmasking by presenting speech and noise in conditions with diotic stimuli in two ears, or with phase-inverted speech or noise in one ear (dichotic). They found an area in the left inferior frontal gyrus (IFG) that showed significant differences in BOLD signals when speech or noise was phase-inverted in one ear, compared to in the diotic condition. A recent fNIRS study [37] focused on the role of auditory spatial attention while listening to speech with an informational masker in which speech and noise were either spatially separated or co-located. They found that two areas of the lateral frontal cortex (LFCx) showed significantly greater activity when target speech and masker were spatially separated versus co-located. The authors interpreted their findings to suggest that both sides of the LFCx on both hemispheres could be involved in spatial attention processing and binaural unmasking, both of which can contribute to improving speech perception. To summarize, as both the IFG and LFCx in the above two studies [36, 37] were within the LPFC, effects of binaural unmasking in humans have been associated with changes in cortical activity in the LPFC on both sides and in the AC.

LPFC and AC contribute differently to speech perception

The bilateral AC and left LPFC are closely connected when auditory perception, phonological and semantic processing are considered [see 38 for meta-analyses, 39]. However, these brain regions have been reported to show different response patterns in speech perception experiments. A neuroimaging study using positron emission tomography (PET) investigated speech perception of sentences in noise [40]. As the level of masking noise decreased, speech intelligibility improved, as expected; the left anterior AC showed increased responses and left IFG showed decreased responses in the regional cerebral blood flow (rCBF). A previous fMRI study [41] investigated cortical activity for auditory identification (‘ba’ versus ‘da’) with varying levels of masking noise. The results showed that, as the masking noise level decreased, the identification accuracy increased, and was positively correlated with bilateral AC activity. The reaction times in identifying the targets were longer at a medium level of masking noises [41], and shorter when the task was hard or very easy. Further, reaction times were positively correlated with activity in the left IFG (anterior insula-operculum), suggesting that left IFG was associated with task demands hence possibly varying listening effort. Another study on this topic using fNIRS [42] examined speech perception in noise in naturalistic scenes and found elevated cortical activity in the right dorsolateral prefrontal cortex (DLPFC) as SNR decreased, task demand increased and speech was reported as being more difficult to understand. Different effects of spectral degradation on cortical activity of the AC and LPFC have also been observed when listeners performed auditory perception tasks. For instance, a previous fNIRS study [23] examined cortical responses for spectrally degraded speech at varying levels of intelligibility (0, 25, 50, 75, and 100%). They found that, within the range of conditions when the speech was intelligible (between 25% and 100% correct), as the intelligibility increased, responses in the ACs increased and responses in the left IFG decreased. Their results in the AC replicate previous findings showing that magnitude of cortical responses in the AC are sensitive to, and positively correlated with speech intelligibility [40, 43–45]. When speech intelligibility was reduced (but above zero) and the task demands were higher, increased cortical activity in the left IFG likely reflected greater cognitive resources being exerted to understand speech in more challenging conditions. Similarly, an increased activity in the left IFG when attending to degraded speech versus unprocessed speech has been reported in previous studies using fMRI [46, 47] and fNIRS [20]. The AC was activated regardless of attention directed towards speech or distractors, suggesting that attention did not significantly alter cortical responses in the AC as compared to the LPFC [20, 46]. Results from above studies together seem to suggest that the left AC is sensitive to SNRs and spectral degradation. The LPFC has quite different response patterns to AC, and attention plays an important role in modulating activity in this region.

Sub-regions of the LPFC

Previous studies were in agreement that the LPFC is involved in effortful speech perception in challenging conditions, though with mixed results regarding which subregions of the LPFC are involved. For instance, the DLPFC on the left [36] and right [37] seem to be involved in attentional listening to speech in noise and spatial processing, with hemispherical differences possibly related to differences in experimental configurations. The left IFG seems to be involved in effortful perception of speech as speech intelligibility decreases, either due to spectral degradation or masking noise. The differences in regions involved could be partially due to the varying number of optodes employed in different fNIRS montages across experiments which have impacted the size of recording region (surface area). For instance, 4 channels on the LFCx in this fNIRS study [37] permitted a greater recording area compared with the 3 channels on the IFG in another study [20]. Without a good coverage of the frontal area, it is difficult to parse out whether LPFC subregions overlap and share common functions, and whether the regions reported in the fNIRS studies [20, 23, 37] overlap with the regions reported in fMRI studies [36, 46, 47]. Alternatively, different subregions within the LPFC might contribute differently to binaural unmasking and the processing of spectrally degraded speech in noises.

Goals of the current study

The current study investigated the effects of spectrally degrading speech and binaural unmasking on cortical activity measured using fNIRS. Our primary interest was in fNIRS measures in the DLPFC and AC. Because of the unclear roles of LPFC subregions, our secondary research question was whether fNIRS measures could reveal the differences among the subregions within the LPFC with varying stimulus configurations. Besides the DLPFC, we also examined two adjacent regions of interest (ROIs) within the LPFC, which corresponded to Brodmann area (BA, 9 and 46), and BA45 and BA47 on the surface. Besides fNIRS measures, this study also assessed subjective assessments of task difficulty and accuracy of speech intelligibility. We predicted that, due to binaural unmasking, listeners would report listening to speech with diotic noise to be easier than with monaural noise. In addition, we predicted that unprocessed speech would be easier than vocoded speech. We hypothesized that when task demands were higher but not impossible to perform, and assuming that listeners remained motivated, more cognitive resources would be spent [3, 48] manifested as greater fNIRS responses in the LPFC, compared to when task demands were lower. For our primary research question, based on previous research, we predicted that fNIRS responses in the AC would show an opposite trend compared with the response pattern in the DLPFC across conditions. For our secondary research question, we predicted that if varying configurations, i.e., spectral degradation, binaural unmasking, and noise level, had different effects on each sub-region in the LPFC, we would see different patterns between the three subregions.

Methods

Participants

Twenty-seven volunteers were recruited for the fNIRS session. Four volunteers were excluded at the beginning as ‘acceptable’ light intensity for fNIRS data collection was not obtained due to hair artifacts. Twenty-three adults advanced to the testing phase (13 women; mean and standard deviation (SD) of ages: 22.7 ± 3.1 years, range 19–30 years; 21 right-handed). These participants were recruited through a university-run online job posting site at the University of Wisconsin-Madison and were paid for their time. A different group of 15 volunteers (13 women, mean ± SD: 21.7 ± 2.8 years, range 19–29 years) were recruited to participate in a separate behavioral speech perception task. These were undergraduate students at the University of Wisconsin-Madison and participated in the study for credits. All the participants were native English speakers (none were bilingual) with normal pure tone thresholds at or less than 20 dB HL with less than 10 dB difference between two ears at octave frequencies between 125 Hz and 8000 Hz. Experimental protocols were within standards set by the National Institutes of Health and approved by the University of Wisconsin–Madison’s Human Subjects Institutional Review Board. All participants provided written consent.

Stimuli

The speech stimuli consisted of matrix sentences each having the same structure with monosyllabic words from 5 categories: name, verb, number, adjective, object, with 8 words in each category [49]. An example sentence is: ‘Bob sold six blue socks.’ For each sentence, in each of the 5 categories of words, one of 8 options was randomly chosen, thereby creating grammatically correct but unpredictable sentences. Both unprocessed (U) and vocoded (V) versions of the sentences were used. The speech was vocoded using a white-noise carrier whereby the spectrum was divided into eight frequency bands between 200 Hz and 7000 Hz, i.e., 8-channel noise-vocoded [2], with filters based on Greenwood functions. The noise stimuli were 8-channel noise-vocoded 4-talker babble of IEEE sentences [50]. Both the matrix sentences and the IEEE sentences were recorded by American woman speakers. The speech (S) stimuli were delivered to the left ears, i.e., monaurally (m) through an ER-2A insert earphone (Etymotic). Two noise (N) configurations were implemented. In the first configuration, noise was presented to the ear with the speech stimulus, i.e., monaurally at 60 dBA (F, maximum level with A-weighted frequency response and Fast time constant), referred to as NmSm. In the second configuration, the noise was presented to both ears, i.e., diotically, at 57 dBA (F) with no interaural time differences, referred to as NoSm. The 3-dBA reduction in the NoSm condition was introduced to compensate for the otherwise doubling of intensity, so that the sound pressure level would be equalized between NmSm and NoSm conditions. Speech was presented at two SNRs, -15 and -10 dB. The same noise stimuli were presented in the unprocessed and vocoded sentence conditions.

fNIRS data collection and signal processing

fNIRS system and experimental montage

The fNIRS system used in this study (NIRScout, NIRX medical technologies, LLC) was a continuous-wave NIRS instrument with 16 LED light sources (Fig 1A, red dots) and 16 avalanche photodiode (APD) detectors (Fig 1A, blue dots). Each LED light source emitted near-infrared light with wavelengths of 760 nm and 850 nm. A source with each of its adjacent detectors at 3 cm constituted measurement channels (Fig 1A, yellow lines). A NIRScap (NIRX medical technologies, LLC) was used to hold the light sources and detectors on the head.

Fig 1

fNIRS montage and a priori regions of interest (ROIs).

fNIRS montage and a priori regions of interest (ROIs).

The fNIRS montage was symmetric between hemispheres; panel (A) plots the connection of light sources (red, n = 8) and detectors (blue, n = 8), and channels (yellow lines) on the left hemisphere. The green dots denote detectors that provide 8-mm channels, with 4 on each side. Panel (B) shows channels comprising the three subregions within the left lateral prefrontal cortex (LPFC) and the left auditory cortex (AC). The three regions were the dorsolateral prefrontal cortex (DLPFC), and two adjacent regions of interest, i.e., f-ROI2 and f-ROI3. The colors are the sensitivity profiles, in log10 mm-1 units, generated from AtlasViewer [51]. Because fNIRS signals of interest (neuronal activity-related changes in hemoglobin from cerebral tissue) are contaminated by responses from the extracerebral tissue, such as systemic and non-evoked brain responses [52], it is essential to reduce such confounds and improve the neural signal quality. In the current study, a bundle of 8 detectors (Fig 1A, 4 green dots on each side) were used, which were situated 8 mm from the light sources, providing “short channels” [53]. The short channel photon path was shallow and expected to only reveal responses in the superficial extracerebral tissue but not the cerebral tissue [54]. Regressing out the short-channel components from the regular fNIRS channels has been shown to improve fNIRS signal quality [53, 55–58], i.e., the ratio of cerebral to extracerebral components. fNIRS responses were examined in three sub-regions in the LPFC and AC on both hemispheres. Each sub-region consisted of three channels (Fig 1B). The DLPFC corresponded to the Broadman area (BA, 9 and 10) on the surface, f-ROI2 corresponded to BA 9 and 46, and f-ROI3 corresponded to BA45 and BA47, which likely covers the IFG. Fig 1B plots the sensitivity profile of each region on the left side [59]. The sensitivity profiles were generated with AtlasViewer software [51] and revealed the light intensity changes over the given area underneath channels in each region.

Data collection

The fNIRS data were collected in a standard IAC sound-attenuated booth, with participants sitting in an armchair wearing a NIRSCap of predetermined size for a snug fit around the head. During preparation, to centralize the cap and to correctly position the optodes, the Cz was positioned halfway the distance of Nasion to Inion and halfway the distance between the two pre-auricular points. Further, the frontal optodes Fp1 was positioned at 10% of the Nasion-Inion distance (a few centimeters above the eyebrows). The cap was attached to a chest wrap for fixation. Then the gains of light intensity at the APD detectors were checked to ensure that all the channels had at least ‘acceptable’ light intensity. If some channels did not show good intensity, the most likely factors were either that the optodes were not perpendicular to the scalp or hair strands interfering with the photon path. To rectify this problem, the optodes were taken out and the hair underneath was gently pushed away to create better contact with the skin before replacing the optodes. The optimization procedure was repeated until most of the optodes received at least acceptable light intensity. Four out of 27 participants with less than half of the channels showing acceptable light intensity were excluded from the study with no further fNIRS data being collected. A pseudo-random block design was implemented for fNIRS data collection consisting of six 9-minute testing periods. Each testing period (Fig 2B) started with a 30-second silent period for baseline data collection, followed by a block of stimuli from one of the four listening conditions, i.e., unprocessed or vocoded speech, with either monaural noise on the speech side or diotic noise (see Fig 2A) at one SNR (-10 or -15 dB). In each testing period, 3 blocks per condition were presented in random order. Among the six testing periods, the order of the 2 SNRs was randomized. Each block lasted 13.6 s and consisted of 5 sentences, with a 0.6-s interval between sentences. After each block, there was a jittered silent period (25 to 35 s in duration). In total, nine blocks of stimuli per condition were presented. The experiment was run in the Presentation® software (https://www.neurobs.com/), which is a stimulus delivery and experiment control platform.

Fig 2

Listening conditions and diagram of fNIRS data collection.

Listening conditions and diagram of fNIRS data collection.

In panel (A), the white loudspeakers are for unprocessed (U) and vocoded (V) speech; the black loudspeakers are for noise (N), monaurally (NmSm) or diotically (NoSm) presented. Panel (B) shows the pseudorandom block design used for data collection, with stimuli in 4 listening conditions (blue boxes) being presented in random order in each session. Participants were required to attend to the speech stimuli by counting the number of color words in each block and then to click a mouse button to respond immediately after the block was finished. Participants were instructed to click the left or right buttons, when the number of recognized color words was even or odd, respectively, and to click the middle button (scroll wheel) if they did not understand any of the words. A practice session prior to fNIRS testing was conducted with each participant, to familiarize them with varying configurations of speech, the pattern of the fNIRS data collection, and the task. During practice, they listened to the vocoded speech in quiet and in noise. A block design was used with varying lengths of silence between blocks and participants were required to do the same task as described above. Verbal instruction was given by the experimenter; text instruction was available either on a brochure or a monitor 1.5 m in front of the participants throughout the testing.

fNIRS signal processing

The fNIRS signals recorded by the NIRScout system were imported to MATLAB (MATLAB R2017a) for further analysis, with software that was either written by the authors or using scripts adopted from Homer2 [60]. A short-channel subtraction method was applied by extracting the principal component in the eight 8-mm channels, and regressing these out from the regular fNIRS channels to reduce the extracerebral components in the fNIRS data [58]. The steps of signal processing were as follows (see Fig 3).

Fig 3

Diagram of fNIRS signal processing.

1) Remove step-like noise

Step-like noise can be caused by a sudden loss of contact between optodes and the skin, or interposition of hair, during data collection. To remove step-like artifacts in the data (y) of each channel, the deviation of y was first estimated as X = diff(y). Any absolute values in X that were two SD above the mean of X were set as zeros, i.e., if abs(X) > mean(abs(X)) + 2*std(X), then X = 0. Response y (with step-like artifacts removed) was then recovered by calculating the cumulative sum of the updated X, i.e., ypost = cumsum(X).

2) Exclude ‘poor’ channels

Channels of ‘poor’ data quality should be excluded from further analysis. As heartbeat signals are the salient signals in the fNIRS measurements, channels that fail to record the heartbeat signals are unlikely to record other physiological or neural responses. To quantify the heartbeat signals, the correlations between heartbeat signals (0.5–1.5 Hz) in the intensity data of two different NIR wavelengths [61], i.e., the scalp coupling index (SCI), were calculated. In the current study, the cut-off SCI threshold was set as 0.15, with the same threshold used in [53], to ensure for each participant there were at least 4 out of 8 short channels remaining for further analysis [58]. Our previous study [53] also demonstrated that using the threshold of SCI > = 0.15 and SCI > = 0.75that was recommended [61], resulted in comparable signal qualities after short-channel subtraction, measured as contrast-to-noise ratios. Further, keeping SCI > = 0.15 ensured that short channels from both the frontal and temporal cortex that measured extracerebral responses from both ROIs, were involved in further analysis. The mean ± SD ratios of regular channels and short channels that were excluded were 2.45% ± 4.46% and 11.05% ± 14.75%, respectively.

3) Preprocess and calculate the ΔHbO and ΔHbR

Light intensity data in individual channels were first converted to optical density [60]. A wavelet decomposition method proposed in [62] was then performed to correct motion artifacts, which might be caused by the physical displacement of the optodes from the surface of the participant’s head. With wavelet decomposition, motion artifacts appear as abrupt breaks in the wavelet domain, whereas hemodynamic responses to stimuli have fewer variable coefficients. To remove the motion artifacts, wavelet coefficients above 0.1 interquartile were set to zero, the same setting as in [21]. Finally, the concentration changes of ΔHbO and ΔHbR responses were calculated using the modified Beer-Lambert law [63], with the effect of age and wavelengths of near-infrared light on the calculation of differential pathlength factor adjusted [64].

4) Subtract the short-channel component

A principal component analysis (PCA) was performed on HbO and HbR responses separately from short channels with SCI > = 0.15. The mean ± SD of short channels involved in PCA among participants was 6.91 ± 1.33. The first two principal components (PCs) among all that contributed the most to the short-channel responses were assumed to be the ‘global’ components across channels and needed to be removed. The mean ± SD of the total variances that two PCs contributed to the HbO and HbR responses were, 77.25% ± 6.39% and 73.21% ± 10.43%, respectively. The two PCs were used as regressors in a general linear model (GLM), the product of which and the corresponding coefficients from GLM were then subtracted from HbO and HbR signals, separately, in each channel. A third-order Butterworth band-pass filter (cut-off frequency at 0.01–0.09 Hz) was applied to remove the high-frequency physiological signals [65], such as respiration and heartbeats.

5) Average responses across blocks and exclude outliers

Block-average responses were calculated for ΔHbO, ΔHbR and ΔHbC, with baseline averages of each block, i.e., 5 s before stimulus onset, being subtracted. All the blocks were inlcuded, regardless of participants’ accuracies in pushing a mouse button to indicate hearing an even or odd number of color words, except for individual blocks that had values above or below the mean ± 2.5 SD of the group. The means of block-average responses across channels that clustered into ROIs were calculated.

6) Quantify ΔHbC responses

Further analyses were performed on ΔHbC amplitudes, which combined ΔHbO and ΔHbR information and were calculated by first identifying the peak of the responses within 5–17 s after stimulus onset. The means within 5 s of ΔHbC responses centered at the peaks were then calculated for individual channels for each participant in each condition.

7) Calculate ΔHbC amplitudes in ROIs

For our primary research interest, ΔHbC amplitudes channels located above the frontal (n = 10) and temporal (n = 10) regions on each hemisphere were first averaged, separately. Channels above the LPFC were further clustered into three subregions based on their anatomic positions. Including AC, fNIRS responses were examined in four ROIs on both hemispheres, with each ROI consisting of three channels.

Scoring self-reported task difficulty

Alongside fNIRS data, subjective assessment of the task difficulty levels was acquired from all participants. Immediately after each of the 6 testing periods, participants were asked to score the difficulty in understanding the sentences (on a scale of 0 to 10, with 0 corresponding to no difficulty and 10 corresponding to extremely difficult; for details, see Fig 4A). For each participant, the self-reported difficulty score for each condition consisted of the mean of the 3 difficulty scores measured from the three testing periods.

Fig 4

Subjects’ self-reported difficulty levels and speech intelligibility scores.

Unprocessed (U, blue) or degraded (V, red) speech stimuli were always presented to the left ear alone. Noise stimuli were presented in ipsilateral (NmSm) or bilateral (NoSm, squares) conditions. In panel (A), violin plots show self-reported difficulty levels in individuals under varying listening conditions at -10dB SNR (left) and -15 dB SNR (right) during fNIRS recording. Panel (B) shows the self-reported difficulty levels in a separate group of participants in a behavioral session with no fNIRS data being recorded. Panel (C) shows the correlation between the self-reported difficulty levels (vertical) and speech intelligibility in rationalized arcsine units (RAU, horizontal) in individuals in the behavioral session.

Subjects’ self-reported difficulty levels and speech intelligibility scores.

Scoring speech intelligibility

To evaluate the average effect of listening conditions on speech intelligibility, a different group of participants who had not been exposed to the stimuli before were tested with no fNIRS data being collected. Participants listened to a set of matrix sentences and responded to one sentence at a time by using a computer mouse to click on the buttons, with a closed set of words displayed on a monitor in the front. There was a break after each sentence and the task was performed at the individual’s own pace. After every 5 sentences in the same condition, participants scaled the task difficulty from 0 to 10 through a computer program. The order of listening conditions was randomized and a total of 20 sentences (100 words) per condition were tested. The accuracy was calculated as the percentage of correct responses participants made per condition, then a rationalized arcsine transform [66] was used to transform accuracy into speech intelligibility. Note that the speech intelligibility task was different from the color word identification task performed during fNIRS data collection, with the latter requiring participants to count the number of color words in 5 sentences but not to identify each word. The color-word counting task was designed to keep listeners’ attention to the stimuli while avoiding frequently pushing buttons or articulating during the speech presentation, which would result in motion artifacts and motor cortical activity contaminating fNIRS data.

Statistical analysis

Statistical analyses were carried out using R (R Core Team, 2019). Aligned rank transform (ART) tests [67], which are nonparametric factorial analyses of variance (ANOVA), were conducted for two reasons. First, the self-reported difficulty scores were ordinal; second, fNIRS data were not normally distributed and variances were not spherical. Thus, data were subjected to nonparametric statistical tests. ART tests (‘ARTool’ package) with a mixed model (‘lmer’ package) were conducted separately on 1) the self-reported difficulty levels during the fNIRS session, 2) difficulty levels from the separate behavioral test, and 3) the speech intelligibility results, with speech type (unprocessed or vocoded), masker condition (NmSm or NoSm), and SNR (-10 dB or -15 dB) as fixed factors and participant as a random factor. Post hoc pairwise comparisons within single factors were conducted using estimated marginal means (‘emmeans’ package) and Tukey method for p-value adjustment. The function ‘testInteractions’ (‘phia’ package) was used for significant interactions between factors. Repeated measures correlations [rmcorr, 68] were calculated between the speech intelligibility scores and difficulty levels from the behavioral test (without fNIRS data collection). For fNIRS measures, we first conducted an ART test to compare ΔHbC amplitudes between the frontal and temporal regions on both hemispheres to confirm the regional differences in cortical activity during speech perception. To address our primary research question whether the left DLPFC and AC responded differently to speech with varying configurations, we conducted an ART test to compare fNIRS measures between these two regions, with speech type, masker condition, SNR, and ROI as fixed factors and participant as a random factor. For post hoc analyses, the same methods were used as for the behavioral measures. Further, to examine fNIRS measures of task demands, repeated measure correlations were calculated between self-reported task difficulty level for varying conditions (n = 8) and ΔHbC amplitudes in the left DLPFC and AC. To address our secondary question whether the three sub-regions within the LPFC responded differently, we conducted another ART test. As the first ART test identified a significant difference between two hemispheres, with greater activity on the right than on the left, we hence examined LPFC on both hemispheres. The ART test was conducted with speech type, masker condition, SNR, ROI and hemisphere as fixed factors and participant as a random factor.

Results

Self-reported task difficulty levels during fNIRS testing

Fig 4A shows results from the subjective assessment of task difficulty in individuals during fNIRS recording and the rating scale that was used. At both SNRs, participants reported lower difficulty for the unprocessed speech (in blue) versus for the vocoded speech (in red), and lower difficulty for speech in the left ear with diotic noise (NoSm) versus with monaural noise (NmSm), suggesting that binaural unmasking may have reduced task difficulty. The vocoded speech with monaural noise was judged to be the hardest condition, with a few participants reporting this condition being extremely hard at -15 dB SNR. Results from ART tests, as reported in Table 1, found significant main effects of SNR, speech type, and masker condition (NmSm versus NoSm), all with p < 0.001. The results revealed increased difficulty in understanding vocoded speech compared to unprocessed speech, and less difficulty in listening to speech with diotic noise compared to with monaural noise. Results from ART tests also found a significant interaction between the speech type and the binaural unmasking (F(1,154) = 4.08, p = 0.045), with greater effect of binaural unmasking in the vocoded condition than in the unprocessed condition.

Table 1

Summary of results from ART tests for behavioral measures.

The behavioral measures are task difficulty level (TDL) recorded during fNIRS session and in the behavioral session with no fNIRS, and speech intelligibility scores (SIC) recorded in the behavioral session.

	SNR	Speech	Masker	SNR * speech	SNR * masker	Speech * masker	SNR* speech * masker
TDL (fNIRS)	F(1,154) = 24.83 p < .001	F(1,154) = 669.25 p < .001	F(1,154) = 90.96 p < .001	F(1,154) = 0.78 p = .38	F(1,154) = 0.11 p = .74	F(1,154) = 4.08 p = .045	F(1,154) = 0.75 p = .39
TDL (no fNIRS)	F(1,98) = 66.69 p < .001	F(1,98) = 485.30 p < .001	F(1,98) = 258.44 p < .001	F(1,98) = 5.61 p = .020	F(1,154) = 0.36 p = .55	F(1,98) = 27.09 p < .001	F(1,98) = 4.26 p = .042
SIC (no fNIRS)	F(1,98) = 153.77 p < .001	F(1,98) = 509.01 p < .001	F(1,98) = 494.08 p < .001	F(1,98) = 5.21 p = .025	F(1,98) = 7.26 p = .008	F(1,98) = 268.99 p < .001	F(1,98) = 5.02 p = .027

Summary of results from ART tests for behavioral measures.

Behavioral results without fNIRS recording

Fig 4B plots the self-reported difficulty levels from a separate group of 15 individuals in a behavioral session without fNIRS recording. As shown in Fig 4B, the patterns of self-reported task difficulty levels across listening conditions in the group involved in behavioral tasks were similar to that in the other group who reported the difficulty levels during the fNIRS session (Fig 4A). Results from ART tests, as reported in Table 1, also showed significant main effects of SNR, speech type, and masker condition on this set of self-reported task difficulty levels, all with p < 0.001. Results also found a significant interaction between SNR, speech type, and masker condition (F(1, 98) = 4.26, p = 0.042). Fig 4C plots the self-reported difficulty levels versus speech intelligibility scores in each condition at two SNRs. Results from ART tests on the speech intelligibility scores showed significant main effects of SNR, speech type, and masker condition (Table 1). Results also showed significant interactions between the following factors: SNRs * masker condition, SNRs * speech type, speech type * masker condition, and SNR * speech type * masker condition. Further, results from the repeated measure correlation analysis found a significant correlation between self-reported task difficulty levels and speech intelligibility scores (r = -0.95, p < 0.001), suggesting that speech intelligibility decreased as listeners reported the tasks being more difficult.

fNIRS responses in the LPFC and AC

Frontal versus temporal cortex

Fig 5 plots the group means (markers) and SEMs (error bars) of ΔHbC amplitudes for the frontal (orange) and temporal cortex (green) on the left and right hemispheres. For each region, the ΔHbC amplitudes across 10 channels were first averaged for individuals. As shown in Fig 5 and from an ART test, ΔHbC amplitudes in the frontal cortex on both hemispheres were greater compared to in the temporal cortex (t(710) = 6.42, p < 0.001), with greater responses on the right hemisphere compared to the left (t(710) = 2.04, p = 0.042). Results from an ART test did not find a significant interaction between cortical regions and hemispheres (F(1,710) = 1.11, p = 0.29). The significantly greater amplitudes on the right versus left hemisphere, were likely because speech stimuli were always presented in the left ears, which might result in greater contralateral than ipsilateral cortical activity.

Fig 5

ΔHbC amplitudes in the frontal and temporal cortex.

Group mean ΔHbC amplitudes for the frontal (orange) and temporal (green) cortex on the left and right hemispheres in response to unprocessed (dots) and vocoded speech (triangles) at -10 dB (solid lines) and -15 dB SNR (dash lines) are plotted.

ΔHbC amplitudes in the frontal and temporal cortex.

Comparing responses in the left DLPFC versus AC

For our primary research interest, we focused on fNIRS measures from the left DLPFC and AC. Fig 6 shows the group mean (markers) and SEM (bars) of the ΔHbC amplitudes for the left DLPFC (panel A) and the AC (panel C) for unprocessed (blue, circles) and vocoded (red, triangles) speech with diotic (NoSm) and monaural (NmSm) noise. For each ROI, results for -10 dB and -15 dB SNR were plotted on the left (solid lines) and right (dash lines) columns. As shown in Fig 6, the left DLPFC and AC showed opposite patterns across conditions. Results from the ART test found a significant difference between the two ROIs, with smaller responses in the left DLPFC than the AC (see Table 2). Further, there were significant interactions between speech type * ROI, and between SNR * ROI, with the left DLPFC showing greater differences between vocoded and unprocessed conditions and greater responses at -15 dB versus -10 dB SNR, compared to the left AC. Fig 6B shows the repeated measure correlation results between self-reported task difficulty level and the ΔHbC amplitudes in the two ROIs. Results demonstrated a significant and positive correlation for the left DLPFC (panel B; r = 0.266, p = 0.004), suggesting a neural marker in the left DLPFC for task demands, with a negative but non-significant correlation for the left AC (panel D; r = -0.134, p = 0.09). The effect size in the DLPFC was relatively small (r = 0.266). Indeed, the correlation was driven by the larger responses to -15 dB versus -10 dB SNR, and greater responses to vocoded versus unprocessed speech, but not binaural unmasking. We did not observe smaller responses in the NoSm conditions, which were self-reported as easier compared to the NmSm conditions.

Fig 6

ΔHbC amplitudes in the left DLPFC and AC.

Panels A and C show the group mean (bars) and SEM (error bars) of ΔHbC amplitudes in the left DLPFC and AC, respectively, in the unprocessed (blue dots) and vocoded (red triangles) speech conditions with monaural (NmSm) and diotic noise (NoSm) at -10 dB (solid lines) and -15 dB SNR (dash lines). Panels (B) and (D) show the repeated measures correlations (rmcorr) between ΔHbC amplitudes in the left DLPFC and AC, respectively, and self-reported task difficulty level. In each panel, the gray dash lines connecting circles represent measures in individuals in different conditions; the red lines indicate the regression result from the rmcorr method.

Table 2

Summary of results from ART tests for fNIRS measures.

The left and right sides summarize the results related to our primary and secondary research questions, relatively. We investigated the effect of speech type by comparing unprocessed (U) versus vocoded (V) speech, and the effect of masker condition by comparing diotic (NoSm) and monaural (NmSm) conditions, the effect of SNRs (-10 and -15 dB SNR), and the interactions between them on different cortical regions.

Frontal versus temporal regions			DLPFC, f-ROI2, and f-ROI3
Factor(s)	ART results	Post hoc	Factor(s)	ART results	Post hoc
Region	F(1, 710) = 41.22 p < .001	frontal > temporal t(710) = 6.42, p < .001	Hemisphere	F(1, 1034) = 11.96 p < 0.001	Right > left t(1034) = 3.46, p < .001
Hemisphere	F(1, 710) = 4.14 p = .042	Right > left t(710) = 2.04, p = .042	ROI	F(1, 1034) = 8.58 p < 0.001	f-ROI2 > DLPFC, p = .005 f-ROI2 > f-ROI3, p< .001
Hemisphere*region	F(1, 710) = 1.11p = .29		Hemisphere*ROI	F(1, 1034) = 2.75p = 0.064	DLPFG > f-ROI3: Right—Left
Left DLPFC versus AC			SNR	F(1, 1034) = 5.08 p = 0.024	-15dB > -10dB SNR t(1034) = 2.25, p = .024
Factor(s)	ART results	Post hoc	Speech	F(1, 1034) = 3.89 p = 0.049	V > U t(1034) = 1.97, p = .049
ROI	F(1,330) = 6.61 p = .011	DLPFC < AC t(330) = 2.57, p = .011	SNR*speech	F(1, 1034) = 3.35p = 0.068	-15 dB SNR > -10dB SNRV vs U
ROI*SNR	F(1,330) = 5.30 p = .022	LPFC > AC: -15 vs -10dB SNR	Masker*SNR	F(1, 1034) = 5.60 p = 0.018	NoSm vs NmSm: -15dB > -10 dB SNR
ROI*speech	F(1,330) = 6.54 p = .011	LPFC > AC: V vs U	Masker	F(1, 1034) = 0.13p = 0.72
Masker	F(1,330) = .12p = .72		Masker*hemisphere	F(1, 1034) = 2.11p = 0.15
SNR	F(1,330) = .15p = .69		Masker*ROI	F(1, 1034) = 0.039p = 0.67
Speech	F(1,330) = .20p = .65		Masker*speech	F(1, 1034) = 0.79p = 0.38
Masker*ROI	F(1,330) = .019p = .89		Hemisphere*SNR	F(1, 1034) = 0.062p = 0.80
Masker*SNR	F(1,330) = .003p = .96		Hemisphere*speech	F(1, 1034) = 0.077p = 0.78
Masker*speech	F(1,330) = 1.56p = .21		ROI*SNR	F(1, 1034) = 0.063p = 0.53
SNR*speech	F(1,330) = 1.45p = .23		ROI*speech	F(1, 1034) = 0.069p = 0.50

ΔHbC amplitudes in the left DLPFC and AC.

Summary of results from ART tests for fNIRS measures.

Responses in the three subregions in the LPFC

To address our secondary question, the fNIRS responses were examined in the three sub-regions within the LPFC on both hemispheres. Fig 7 plots the group mean (markers) and SEMs (shaded errors) of ΔHbC amplitudes on the left and right hemispheres, for the DLPFC (panels A, B), f-ROI2 (panels C, D) and f-ROI3 (panels E, F). An ART test was conducted on the ΔHbC amplitudes for the three subregions on the two hemispheres. Results showed a significant effect of speech type, SNR, ROI and hemisphere, and a significant interaction between masker condition * SNR. Detailed results are reported in Table 2. Post hoc analyses found greater responses to the vocoded versus unprocessed speech (t(1034) = 1.97, p = 0.049) and greater responses on the right hemisphere compared to the left (t(1034) = 3.46, p < 0.001). Between three ROIs, f-ROI2 showed greater ΔHbC amplitudes compared to the DLPFC (t(1034) = 3.12, p = 0.005) and f-ROI3 (t(1034) = 3.92, p < 0.001). For the interaction between binaural unmasking and SNR, post hoc analysis found greater differences between NoSm and NmSm, i.e., masker condition, at -15 dB SNR compared to -10 dB SNR (χ2(1) = 5.60, p = 0.018).

Fig 7

ΔHbC amplitudes in the three subregions within the LPFC on two hemispheres.

Discussion

In the current study, we used fNIRS to investigate cortical activity in response to vocoded versus unprocessed speech, and to compare conditions with monaural versus diotic noises at two different SNRs. These configurations were selected to understand how the brain responds when listening to speech with varying configurations designed to induce binaural unmasking. To ascertain task demands and participants’ performance, respectively, we also recorded participants’ self-reported task difficulty and speech intelligibility for each condition. For fNIRS responses, we were primarily interested in the LPFC and the AC as the two regions have been reported to respond differently depending on attention and speech type. Our secondary interest was whether each of the three sub-regions within the LPFC was more sensitive to some configurations than the others.

LPFC and AC responded differently to two speech types

We expected greater cognitive resources would be spent to overcome obstacles in goal pursuit in the more challenging conditions [3]. This would be manifested as greater changes in the ΔHbC (cerebral oxygenation) amplitudes, which are associated with increased neuronal activity through neurovascular coupling [11]. Our results showed significantly greater changes in the ΔHbC amplitude in the frontal compared to the temporal regions when responding to varying types of configurations. Further, our a priori analysis found different response patterns of ΔHbC amplitudes between the left DLPFC and AC, with the left DLPFC showing greater differences between two SNR levels and between the two speech types (Table 2). The greater changes in cortical activity in the DLPFC compared to the AC could be related to effortful speech perception. Consistent with our results, previous neuroimaging studies using fMRI [46] and fNIRS [20] also found different patterns in the LPFC and AC to vocoded and unprocessed speech depending on the attention. Both studies showed greater responses to the vocoded speech compared to the unprocessed speech in the left LPFC when listeners attended to the target speech rather than irrelevant distracters. However, responses in the AC on both sides were not affected by listeners’ attention. Taken together, these results suggest that effortful perception of spectrally degraded speech, which requires attentional listening is associated with greater changes in cortical activity in the left LPFC but not AC. The differences in cortical locations between the current study and the above two studies, i.e., left DLPFC and IFG could be due to the limited spatial resolution of fNIRS compared to fMRI or the differences in recording regions of fNIRS systems. The reported regions could overlap or share the same cognitive functions for effortful speech perception. Alternatively, different sub-regions were involved for effortful speech perception with varying configurations. Hence, we divided the LPFC into three sub-regions, i.e., DLPFC, f-ROI2, and f-ROI3, and attempted to explore fNIRS measures in these regions on both hemispheres. Our results showed a significant effect of speech type with greater ΔHbC amplitudes to vocoded versus unprocessed speech, and significant differences in responses between three subregions, with greater responses in the f-ROI2 compared to the DLPFC and f-ROI3. Further, DLPFC showed greater hemispherical differences compared to f-ROI3. However, there was no significant interaction between speech type * ROI. Our results did not demonstrate significantly different effects of SNR or masker condition between the three sub-regions, either. It is possible that there are no functional differences, and the three subregions in the current study (Fig 1B) still overlap and share common functions in the effortful speech perception with varying stimulus configurations. Each ROI consisted of three 3-cm channels and overlapped on the surface i.e., DLPFC (BA, 9 and 10), f-ROI2 (BA, 9 and 46), and f-ROI3 (BA, 45 and 47). As shown in the sensitivity map (Fig 1B), the three regions might share some measures of changes in hemoglobin from the same origins. It is also likely that our data was underpowered due to the small sample, and any potential differences between the three subregions could not be assessed. Further, the configurations, i.e., spectral degradation or binaural unmasking at two SNRs, might be too complicated. Future studies are in need to include larger samples or to focus on no more than two factors concurrently when investigating the LPFC role for binaural unmasking and processing spectrally degraded information.

Evidence of effect of SNR but not binaural unmasking

Although our data suggest a potential signature of task difficulty in the DLPFC, related to worsening SNR, we found no cortical signature corresponding to binaural unmasking in this region. This finding is somewhat surprising, as our behavioral results showed a significant main effect of masker condition for self-reported task difficulty levels, with greater task difficulty in the monaural compared to the diotic conditions. In the former conditions, both speech and noise would be perceived in the same ear (co-located). Whereas, when target speech is presented to one ear and noise is presented to both ears, if binaural integration for the masker occurs, listeners perceive the speech from the left ear, and the noise is perceived as a single, fused auditory image at the center/front of their head; this separation is known to induce binaural unmasking [69, 70]. Binaural unmasking has been shown to improve speech reception threshold (SRT) in noise by 5–8 dB and higher (even 12–15 dB), depending on the speech materials utilized, because spatial separation of target and maskers improves intelligibility and speech understanding [71-73]. Our behavioral results revealed the effect of binaural unmasking by showing better speech intelligibility and lower self-reported task difficulty levels. We therefore expected some evidence of binaural unmasking in the neural activity. We specifically focused on the left DLPFC, previously demonstrated to boost visuospatial memory capacity in the parietal cortex, and to select the relevant verbal representation in the IFG through top-down control [74, 75]. However, our fNIRS measures did not show a significant main effect of masker condition in the left DLPFC. We considered the potential effect of the differences in the monaural masker noise level, as we reduced the noise level by 3 dB (hence raising the monaural SNR by 3 dB in the NoSm conditions to compensate for the otherwise doubling of intensity). When examining the effect of binaural unmasking, by comparing fNIRS measures between NoSm and NmSm conditions, the 3 dB increase in the monaural SNR in the NoSm condition could have diminished binaural unmasking, eliminating the neural correlate in the LPFC. On the other hand, the 3 dB difference might not affect fNIRS measures, as improved SRT related to binaural unmasking could be up to 5–8 dB or even higher when responding to vocoded speech with same-band masker noise at separated versus co-located conditions [76]. We also considered the possibility that the left DLPFC plays a role in the cortical processing of binaural unmasking. For instance, this study [77] investigated the auditory spatial processing in patients who had focal left or right hemisphere damage. Their results demonstrated that right hemispheric damage caused the imprecision of distinguishing sound presented from both hemispaces, whereas left hemispheric damage only caused imprecision in the contralateral hemisphere. Another study [37] investigated cortical activity in the LFCx for target speech with speech masker co-located versus separated from the speech. They found that the right LFCx showed significantly greater responses in the spatially separated versus co-located conditions. Therefore, as a secondary analysis, we examined the effect of varying configurations on the three subregions on the LPFC on both hemispheres. Our fNIRS results demonstrated a significant main effect of SNR, with greater ΔHbC amplitude at lower (-15 dB) versus higher (-10 dB) SNR (Table 2). We also found a significant interaction between masker condition and SNR, with greater differences in the ΔHbC amplitudes for the LPFC between NoSm and NmSm conditions at -15 dB SNR compared to at -10 dB SNR. Greater responses to NoSm versus NmSm are consistent with results from this previous study [37] but opposite to our hypothesis, which proposed that binaural unmasking would reduce the task demand, hence resulting in smaller responses in the NoSm conditions. These results suggest that binaural unmasking might affect cortical activity by improving the salience of speech in noise rather than through reducing listening effort. When SNR was lower (-15 dB SNR), i.e., the speech was softer, improving speech salience by separating speech and masker locations better enhanced speech perception compared to when SNR was higher (-10 dB SNR). Further, the differences between studies in the hemisphere effects could be due to different experimental configurations. Indeed, using fNIRS, our study showed a significant effect of SNR in the LPFC on both hemispheres; whereas another fNIRS study [42] found elevated cortical activity in the right DLPFC in response to speech narratives recorded from naturalistic and noisy environment as the SNR decreased. Surprisingly, no evidence of binaural unmasking was found in the AC, which has been reported as a neural marker of binaural unmasking in both humans [31, 32] and other mammals [29]. Much like in the left DLPFC, the AC faces the issue of limited near-infrared light penetration. In adults, the near-infrared light penetration is about 1.5 cm [78], which could be too shallow for the primary AC, which lies in the deeper superior temporal sulcus. Hence, optodes channels above AC might not be able to detect good neural signal to noise ratios. Alternatively, it could be again due to the differences in experimental configurations and protocols, as some other studies [33, 34] did not demonstrate evidence of binaural unmasking in the AC.

Non-monotonic responses in the LPFC with task demands

Our results showed a significant and positive correlation between ΔHbC amplitudes in the left DLPFC with self-reported task difficulty levels. However, in the condition that was self-reported as most difficult with the lowest speech intelligibility (vocoded with monaural noise, VNmSm at -15 dB SNR), ΔHbC amplitudes did not increase compared to the second most difficult condition (Fig 6, VNoSm), i.e., non-monotonic. The non-monotonic response in the LPFC in the current study could be first due to a non-linear relation between cognitive resources exerted and the increase in task demands [23, 46]. For instance, this fNIRS study [23] examined the cortical activity in a group of normal-hearing listeners when responding to speech with degraded spectral information and demonstrated a non-monotonic (U-shaped) pattern of ΔHbO responses in the left IFG versus speech intelligibility (0, 25, 50, 75, and 100% correct). The responses in IFG were lower when speech was not intelligible at all or when it was relatively easy to understand (100% correct). The non-monotonic pattern of changes in cortical activity with increase in task demands, were analogous to the patterns of pupil dilations with varying levels of speech intelligibility [79-82]. For instance, this pupillometry study [80] recorded pupil dilation by presenting speech sentences with varying levels of masking noise. They demonstrated that pupil dilation peaked at an intermediate intelligibility level and decreased when speech intelligibility was either so poor that it was close to floor level or so good that it reached ceiling level. We also considered the possibility of LPFC responses being modulated by varying task demands related to decision-making and working memory. Although the current study was designed to measure speech perception while recording fNIRS data, it could be argued that the task involved a decision-making component as well. During the task, while listening to running speech, participants had to detect the color words, hold that information in memory, add up the number and decide if it was even/odd. Listening to degraded versus unprocessed speech, with monaural versus diotic noise, made it harder to recognize the words, thus potentially hindering the memory recall and decision regarding the color. Congruent with theory, greater activity was found in the right LPFC when participants were listening to degraded versus unprocessed speech and making decisions that involved semantic or syllable processing in a PET study [83]. The increased activity in the LPFC to vocoded speech in [83] and in the current study is compatible with the role of the PFC for decision making in the auditory detection tasks, resulting in increased cognitive demands. To summarize, the non-monotonic pattern of cortical activity in the current study could be due to the high task demands in the most difficult condition, which also resulted in very poor speech intelligibility (Fig 4). It could also be modulated by increased attention and engagement for decision making and greater working memory demands when the task was reported as being more difficult.

Hemispheric differences are unrelated to handedness

Our results found a significant main effect of hemisphere for ΔHbC amplitudes, with greater response amplitudes on the right hemisphere (Table 2), compared to the left. These results may have occurred because in the current study speech was presented to the left ear alone, while the noise was either presented to the left ear or both ears through insert earphones. Auditory perception involves the AC in both hemispheres where greater contralateral representation is well known to exist [83, 84], i.e., greater responses on the side of the brain opposite to the ear of stimulation than in the ipsilateral side. In the previous two studies that also examined effortful speech perception of degraded speech, no hemisphere difference was found [20, 46]. These inconsistencies might be accounted for by the use of free-field stimuli [20] or diotic stimulation [46] permitting sounds to reach both ears in all conditions. We considered whether the significant differences in fNIRS responses between the two hemispheres were driven by the handedness of the participants. Handedness has also been reported to affect asymmetric cortical activity involved in speech processing and localization [85, see reviews by 86]. According to the statistics reported in [86], speech processing in 97% of the right-handed participants is left lateralized, and is right lateralized in the remaining 3% of participants. Whereas, in the left-handed participants the ratios shift to 70% and 30%, respectively. The majority of asymmetrical cortical activity is found in the planum temporale and other primary and association auditory cortices [87]. For Broca’s area (BA 44 and 45), handedness was found to have affected the asymmetries of the par opercularis (BA 44), with the right-handed participants showing left-hemisphere asymmetry and the left-handed subjects showing right-hemisphere asymmetry [85]. In this fNIRS study [20], a small number of left-handed participants showed increased activity in the right IFG when listening to degraded speech versus unprocessed speech, opposite to what was found in their group of right-handed participants. Though the results were not significant due to the small sample size, they suggested that the laterality of IFG, which showed signs of effortful activity, might be related to the handedness of subjects. However, this theory is insufficient for explaining our results, as 21 out of twenty-three participants were right-handed and the significant activity was found in the right hemisphere. To summarize the fNIRS findings, the significant results found here in the LPFC could be driven by several factors such as effortful listening for speech perception, attention and task engagement for decision making, working memory demands, and contralateral stimulation from the left ear. However, the effects did not stem from handedness.

Conclusion

The current study investigated whether neural signatures for binaural unmasking could be identified by examining cortical activity using fNIRS. Our results demonstrated significant differences between the left DLPFC and the AC, in responses to vocoded versus unprocessed speech, at two SNRs that were 5 dB apart, suggesting that these anatomical areas may play different roles in speech perception, in line with previous findings. Our fNIRS data did not demonstrate evidence of binaural unmasking in the LPFC; however, a significant interaction between SNR and masker condition suggests that binaural unmasking affects cortical activity in the LPFC through improving SRT rather than reducing effort exerted. The result that no significant regional differences existed within the LPFC suggests that these regions might share common cognitive functions in response to effortful speech perception in the current configurations. (XLS) Click here for additional data file. 8 Mar 2022

PONE-D-21-30090

Effects of degraded speech processing and binaural unmasking investigated using functional near-infrared spectroscopy (fNIRS)

PLOS ONE Dear Dr. Zhou, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Both reviewers felt that the manuscript is a significant contribution to our understanding of the neural basis of listening effort and spatial (un)masking. The larger concerns stem from questions about whether it is possible with fNIRS to spatially resolve regions of frontal cortex that are analyzed separately (i.e. IFG vs. DLPFC) and whether it is possible at all to measure activity from DMPFC given its depth. I agree with the recommendation of both reviewers that it would be better and technically more correct to lump these regions into "lateral prefrontal cortex" given that the main questions are about AC vs PFC and the fact that these data probably don't permit dissociation of sub-regions of PFC. Please submit your revised manuscript by Apr 22 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Andrew R Dykstra Academic Editor PLOS ONE Journal requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf. 2. Please provide additional details regarding participant consent. In the Methods section, please ensure that you have specified (1) whether consent was informed and (2) what type you obtained (for instance, written or verbal). If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information. 3. Please change "female” or "male" to "woman” or "man" as appropriate, when used as a noun (see for instance https://apastyle.apa.org/style-grammar-guidelines/bias-free-language/gender). 4. Thank you for stating the following financial disclosure: “This study was supported by National Institute on Deafness and Other Communication Disorders (NIH-NIDCD, R01DC003083 to RL), UW-Madison’s Office of the Vice Chancellor for Research, and a Core grant from NIH-NICHD (U54HD090256 to Waisman Center).” Please state what role the funders took in the study. If the funders had no role, please state: ""The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."" If this statement is not correct you must amend it as needed. Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf. 5. Thank you for stating the following in the Funding Section of your manuscript: “This study was supported by NIH-NIDCD (R01DC003083 to RL), UW-Madison’s Office of the Vice Chancellor for Research, and a Core grant from NIH-NICHD (U54HD090256 to Waisman Center).” We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: “This study was supported by National Institute on Deafness and Other Communication Disorders (NIH-NIDCD, R01DC003083 to RL), UW-Madison’s Office of the Vice Chancellor for Research, and a Core grant from NIH-NICHD (U54HD090256 to Waisman Center).” Please include your amended statements within your cover letter; we will change the online submission form on your behalf. 6. Thank you for stating the following in the Competing Interests section: “I have read the journal's policy. Dr. Litovsky discloses that she is a consultant for Frequency Therapeutics. The other authors have all certified that they have NO affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.” Please confirm that this does not alter your adherence to all PLOS ONE policies on sharing data and materials, by including the following statement: ""This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests). If there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared. Please include your updated Competing Interests statement in your cover letter; we will change the online submission form on your behalf. 7. Please include your full ethics statement in the ‘Methods’ section of your manuscript file. In your statement, please include the full name of the IRB or ethics committee who approved or waived your study, as well as whether or not you obtained informed written or verbal consent. If consent was waived for your study, please include this information in your statement as well. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: I have read the following submission carefully and consider it to be of good quality, hypothesis driven, and methodologically transparent. The article describes a speech in noise task in which listeners must attend to speech that is either unprocessed or degraded by an 8-channel vocoder. The key measures are from fNIRS centered on primary auditory centers in both hemispheres. The authors tested the hypothesis that two specific areas (left AC and left DLPFC) would carry different information regarding the nature of performing on the task, specifically regarding attention versus speech intelligibility factors during effortful listening. If there was a single flaw in the presentation of the paper, it would be too much consideration of listening effort when setting up the motivation or discussing the results. The authors measured subjective assessments of task difficulty as an indirect measure of listening effort for separate individuals using a different task (counting versus speech recognition), and no objective measure of the participants in the fNIRS test was used. Therefore, in my opinion, the space dedicated to its potential relevance is mostly conjecture and unnecessary for the stated hypotheses (ex. Line 523). That aside, the paper offers additional evidence of the specific roles that primary auditory centers have in speech-in-noise listening, and therefore, it is good contribution to the field. The following are minor but should be addressed: 1) At various points (ex. Line 175), the authors refer to the poor fNIRS spatial resolution of previous studies (i.e., only 3-4 channels over certain areas). It’s unclear how that relates to the present paper other than maybe from counting dots in Figure 1. There’s a potential spot to mention the difference on Line 549. 2) Initially I flagged the stimulus methods for reducing the level of the masker by 3dB in the diotic condition; however, this was addressed in the discussion as a possible contributing factor to the results. It would be good, however, to provide a source reference for choosing 3 dB as the reduction on or around Line 235. Because this section specifically refers to loudness perception, and BLDELs can exceed 3, even 5 dB, for tonal stimuli, and it is common to have large individual variability, it would be helpful to acknowledge this upfront. 3) I may have missed it, but how accurate were listeners at counting color words and labeling odd/even for the fNIRS task? Were trials treated differently depending on their accuracy? Others: 1) Title page has alphabet superscripts and numeral affiliations 2) I am not familiar with the in-text citation format that includes the first two of multiple authors (e.g., Line 108), but for consistency, there was at least two instances where the more traditional et al was used instead (line 560 or 636). 3) Typo: “unmaaking” (Line 97) 4) Delete “the” between “in….challenging conditions” 5) Line 228: This IEEE citation was not in the references 6) Line 230-231: I’m not sure what this is in parentheses after ER-2A phones – should be Etymotic? 7) Line 522: move “the” to after “between” (not before). 8) Line 633: change sNR to SNR 9) Line 715: The word “unfortunately” in this context sounds like the authors had a personal stake or interest in the results which I doubt is the case. I suggest removing this transition word. Reviewer #2: The authors investigated the differential involvement of the auditory cortex (AC) and dorsolateral prefrontal (DLPFC) cortex in different listening conditions that manipulate the spatial configuration of noise (monotic vs diotic inducing spatial release from masking), signal-to-noise ratio as well as speech processing (using a vocoder simulation or left unprocessed). They found that DLPFC and AC showed different patterns of activities for SNR and vocoded speech manipulation but did not find evidence of spatial unmasking changing activities in these areas. Furthermore, they found a correlation with perceived task difficulty and neural activity in the left DLPFC. Overall, this is a well written manuscript that uses sophisticated fNIRS analyses and appropriate statistical approaches. Nevertheless, I have the following concerns: Major: 1. It is unclear why the secondary question is there in the first place when the spatial resolution of fNIRS is clearly not sufficient to resolve these regions (as evident in the anatomical overlaps shown in Figure 1B). Perhaps the authors wanted to bring in the different literature to talk about listening effort and speech processing and thus needed to mention DLPFC and Inferior Frontal Gyrus (IFG) as potential regions of interest. Personally, I think it would just be cleaner to say that you have optodes over the lateral prefrontal cortex (and not make a delineation of dorsal vs ventral) and that the effects you see can be attributed to DLPFC and / or IFG. I think the authors already made the case in the discussion that ll545-7 "The lack of difference could also be due to the poor spatial resolution of fNIRS and measures from the three subregions could have still overlapped" so why present them in the first place? 2. I'm also doubtful that the medial prefrontal cortex is also part of the ROI because of how deep that structure is. As a reader, I also want to be convinced the the medial cortex can be captured in these optodes and only then bring in the literature regarding medial PFC in working memory, decision-making in your discussion. 3. p<0.09 should never be cast as "marginally non significant." Please remove this in the abstract and elsewhere in the manuscript. Minor: 1. In-text citation looks funny when the authors are mentioned (e.g., Ln 108, it should be Hughes et al and not Hughes, Rowe and there are many instances throughout the MS that needs to be fixed). 2. Fig 5 is missing SNR labels 3. Ln 523: why Left AC here and not just AC? 4. Ln 633 Change sNR to SNR ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Erol Ozmeral Reviewer #2: Yes: Adrian KC Lee [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

29 Mar 2022 Dear Editor, We want to thank you and two reviewers for providing us with excellent feedback to further improve the quality of this manuscript. We have worked to clarify the two major concerns, i.e., too much discussion about listening effort from Reviewer #1, and fNIRS’ poor spatial resolution from Reviewer #2. We provide a point-by-point response below. We look forward to hearing back from you about the suitability of the paper for publication. Editor’s comment Both reviewers felt that the manuscript is a significant contribution to our understanding of the neural basis of listening effort and spatial (un)masking. The larger concerns stem from questions about whether it is possible with fNIRS to spatially resolve regions of frontal cortex that are analyzed separately (i.e. IFG vs. DLPFC) and whether it is possible at all to measure activity from DMPFC given its depth. I agree with the recommendation of both reviewers that it would be better and technically more correct to lump these regions into "lateral prefrontal cortex" given that the main questions are about AC vs PFC and the fact that these data probably don't permit dissociation of sub-regions of PFC. Response: We appreciate the concern that the spatial resolution of fNIRS might not permit a distinction of DLPFC from IFG. Because we are interested in the questions raised in the study regarding varying configurations on cortical activity in the sub-regions within LPFC, in the revision, we did not label the two regions as DMPFC or IFG. Instead, we labelled them as f-ROI2 and f-ROI3, and spelled out their correspondence with Broadman areas on the surface. Though our results did not find functional differences across the three sub-regions in varying configurations, we still showed significant differences across ROIs. We therefore elaborated on our interpretations of the results. Please see our detailed responses to Reviewer #2 below. Comments from and responses to Reviewer #1 Reviewer #1: I have read the following submission carefully and consider it to be of good quality, hypothesis driven, and methodologically transparent. The article describes a speech in noise task in which listeners must attend to speech that is either unprocessed or degraded by an 8-channel vocoder. The key measures are from fNIRS centered on primary auditory centers in both hemispheres. The authors tested the hypothesis that two specific areas (left AC and left DLPFC) would carry different information regarding the nature of performing on the task, specifically regarding attention versus speech intelligibility factors during effortful listening. If there was a single flaw in the presentation of the paper, it would be too much consideration of listening effort when setting up the motivation or discussing the results. The authors measured subjective assessments of task difficulty as an indirect measure of listening effort for separate individuals using a different task (counting versus speech recognition), and no objective measure of the participants in the fNIRS test was used. Therefore, in my opinion, the space dedicated to its potential relevance is mostly conjecture and unnecessary for the stated hypotheses (ex. Line 523). That aside, the paper offers additional evidence of the specific roles that primary auditory centers have in speech-in-noise listening, and therefore, it is good contribution to the field. Response: We appreciate this comment about over discussing listening effort. In the revision, we deleted the sentence in the introduction ‘This result was consistent with the idea that task demands and motivation are related (3, 42), where listening effort increases with task difficulty but decreases again due to lack of motivation when the task becomes too difficult.’ In the discussion, we cut down listening effort and just focused on the measure of task demands. We reorganized the subsection of ‘Non-monotonic responses in the LPFC with task demands’ (lines 645-681), and discussed the non-monotonic pattern of cortical activity in the current study could be due to the high task demands in the most difficult condition, which also resulted in very poor speech intelligibility; this pattern could also be modulated by decision making and working memory that are involved when performing the task. The following are minor but should be addressed: 1) At various points (ex. Line 175), the authors refer to the poor fNIRS spatial resolution of previous studies (i.e., only 3-4 channels over certain areas). It’s unclear how that relates to the present paper other than maybe from counting dots in Figure 1. There’s a potential spot to mention the difference on Line 549. Response: We appreciate this comment. In the revision (lines 257-260), we included the sentence that ‘Each sub-region consisted of three channels (Fig. 1B). The DLPFC corresponded to Broadman area (BA, 9 and 10) on the surface, f-ROI2 corresponded to BA 9 and 46, and f-ROI3 corresponded to BA45 and BA47, which likely covers the IFG.’ We appreciate the suggestion about elaborating on the differences across the three subregions. In the discussion (lines 574-579), we included the sentences ‘It is possible that there are no functional differences, and the three subregions in the current study (Fig. 1B) still overlap and share common functions in the effortful speech perception with varying stimulus configurations. Each ROI consisted of three 3-cm channels and overlapped on the surface i.e., DLPFC (BA, 9 and 10), f-ROI2 (BA, 9 and 46), and f-ROI3 (BA, 45 and 47). As shown in the sensitivity map (Fig. 1B), the three regions might share some measures of changes in hemoglobin from the same origins.’ 2) Initially I flagged the stimulus methods for reducing the level of the masker by 3dB in the diotic condition; however, this was addressed in the discussion as a possible contributing factor to the results. It would be good, however, to provide a source reference for choosing 3 dB as the reduction on or around Line 235. Because this section specifically refers to loudness perception, and BLDELs can exceed 3, even 5 dB, for tonal stimuli, and it is common to have large individual variability, it would be helpful to acknowledge this upfront. Response: We appreciate this comment. We agree BLDELs can be more than 3 dB and vary across stimuli. The 3 dB adjustment was planned to equalize the sound pressure level in two conditions, rather than the perception of loudness. We rephrased this sentence as ‘The 3-dBA reduction in the NoSm condition was introduced to compensate for the otherwise doubling of intensity, so that the sound pressure level would be equalized between NmSm and NoSm conditions’. 3) I may have missed it, but how accurate were listeners at counting color words and labeling odd/even for the fNIRS task? Were trials treated differently depending on their accuracy? Response: We appreciate this comment. In the revision we included the sentence that ‘All the blocks were included, regardless of participants’ accuracies in pushing a mouse button to indicate hearing an even or odd number of color words, except for individual blocks that had values 2.5 standard deviations above or below the mean of the group.’ Others: 1) Title page has alphabet superscripts and numeral affiliations Response: fixed! 2) I am not familiar with the in-text citation format that includes the first two of multiple authors (e.g., Line 108), but for consistency, there was at least two instances where the more traditional et al was used instead (line 560 or 636). Response: We appreciate this comment and revised the citation format through the text to be consistent. 3) Typo: “unmaaking” (Line 97) Response: fixed! 4) Delete “the” between “in….challenging conditions” Response: fixed! 5) Line 228: This IEEE citation was not in the references Response: We appreciate the comment and included the citation in the revision. 6) Line 230-231: I’m not sure what this is in parentheses after ER-2A phones – should be Etymotic? Response: Thank you for the comment. We revised the citation for this product. 7) Line 522: move “the” to after “between” (not before). Response: fixed! 8) Line 633: change sNR to SNR Response: fixed! 9) Line 715: The word “unfortunately” in this context sounds like the authors had a personal stake or interest in the results which I doubt is the case. I suggest removing this transition word. Response: We deleted this transition word, and also moved this sentence up to for a better flow in Conclusions. Please see lines 720-725. Reviewer #2: The authors investigated the differential involvement of the auditory cortex (AC) and dorsolateral prefrontal (DLPFC) cortex in different listening conditions that manipulate the spatial configuration of noise (monotic vs diotic inducing spatial release from masking), signal-to-noise ratio as well as speech processing (using a vocoder simulation or left unprocessed). They found that DLPFC and AC showed different patterns of activities for SNR and vocoded speech manipulation but did not find evidence of spatial unmasking changing activities in these areas. Furthermore, they found a correlation with perceived task difficulty and neural activity in the left DLPFC. Overall, this is a well written manuscript that uses sophisticated fNIRS analyses and appropriate statistical approaches. Nevertheless, I have the following concerns: Major: 1. It is unclear why the secondary question is there in the first place when the spatial resolution of fNIRS is clearly not sufficient to resolve these regions (as evident in the anatomical overlaps shown in Figure 1B). Perhaps the authors wanted to bring in the different literature to talk about listening effort and speech processing and thus needed to mention DLPFC and Inferior Frontal Gyrus (IFG) as potential regions of interest. Personally, I think it would just be cleaner to say that you have optodes over the lateral prefrontal cortex (and not make a delineation of dorsal vs ventral) and that the effects you see can be attributed to DLPFC and / or IFG. I think the authors already made the case in the discussion that ll545-7 "The lack of difference could also be due to the poor spatial resolution of fNIRS and measures from the three subregions could have still overlapped" so why present them in the first place? Response: We appreciate this comment. As previous studies reported different regions of interest within the LPFC, we asked an exploratory research question whether there are functional differences across sub-regions within the LPFC in varying configurations. We acknowledge that fNIRS has limited spatial resolution hence unable to measure response from DMPFC. In the revision, we made changes below to help clarify our rationale and interpretation of results. • In the introduction (lines 67-68), we phrased the secondary aim of this study as ‘Our secondary question was whether there were functional differences across sub-regions within the LPFC in varying stimulus configurations.’ • In the introduction (lines 167-173), to motivate the rationale of exploring regional differences, we included ‘The differences in regions involved could be partially due to the varying number of optodes employed in different fNIRS montages across experiments which have impacted the size of recording region (surface area). For instance, 4 channels on the LFCx in Zhang et al. (37) permitted a greater recording area compared with the 3 channels on the IFG in Wijayasiri et al. (20). Without a good coverage of the frontal area, it is difficult to parse out whether LPFC subregions overlap and share common functions, and whether the regions reported in the fNIRS studies (20, 23, 37) overlap with the regions reported in fMRI studies (36, 46, 47)’. • As suggested by the other reviewer, we cut down our discussions of listening effort as we did not have objective measures of effort from the fNIRS session, nor was listening effort most related to our hypotheses. Please see our response above. • In the discussion (lines 576-584), to interpret the results regarding the three sub-regions, we add the sentences ‘Each ROI consisted of three 3-cm channels and overlapped on the surface i.e., DLPFC (BA, 9 and 10), f-ROI2 (BA, 9 and 46), and f-ROI3 (BA, 45 and 47). As shown in the sensitivity map (Fig. 1B), the three regions might share some measures of changes in hemoglobin from the same origins. It is also likely that our data was underpowered due to the small sample, and any potential differences between the three subregions could not be assessed. Further, the configurations, i.e., spectral degradation or binaural unmasking at two SNRs, might be too complicated. Future studies are in need to include larger samples or to focus on no more than two factors concurrently when investigating the LPFC role for binaural unmasking and processing spectrally degraded information.’ 2. I'm also doubtful that the medial prefrontal cortex is also part of the ROI because of how deep that structure is. As a reader, I also want to be convinced the the medial cortex can be captured in these optodes and only then bring in the literature regarding medial PFC in working memory, decision-making in your discussion. Response: We appreciate this comment. In the revision, we did not label DMPFC and IFG, instead, we used ‘two adjacent regions of interests within LPFC’ and noted as f-ROI2 and f-ROI3’. We also agreed that we possibly stretched a bit far with our discussion regarding DMPFC activity related to decision-making and working memory. In the revision, we cut down this discussion about VMPFC, reorganized our discussion, and ‘considered the possibility of LPFC responses being modulated by varying task demands related to decision-making and working memory.’ Please see lines 664-681. 3. p<0.09 should never be cast as "marginally non significant." Please remove this in the abstract and elsewhere in the manuscript. Response: We appreciate this comment and described p=0.09 as non-significant through the text in the revision. Minor: 1. In-text citation looks funny when the authors are mentioned (e.g., Ln 108, it should be Hughes et al and not Hughes, Rowe and there are many instances throughout the MS that needs to be fixed). Response: We appreciate this comment and revised the citation format through the text to be consistent. 2. Fig 5 is missing SNR labels Response: fixed! 3. Ln 523: why Left AC here and not just AC? Response: fixed! 4. Ln 633 Change sNR to SNR Response: fixed! Submitted filename: Response to Reviewers.docx Click here for additional data file. 12 Apr 2022 Effects of degraded speech processing and binaural unmasking investigated using functional near-infrared spectroscopy (fNIRS) PONE-D-21-30090R1 Dear Dr. Zhou, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Andrew R Dykstra Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: All comments have been addressed. Please check new text for typos, e.g.: Line 360: inlcuded -> included Line 725: exisited -> existed Reviewer #2: The authors have addressed all the comments except I urge them to consider these final recommendations: the DLPFC region in fig 1B is in general not different from f-ROI2 and f-ROI3. I suggest labeling it as f-ROI1 instead. Also, I suggest the following wordings on ll 260-262: The f-ROI1 most likely encompasses Broadman area (BA, 9 and 10), f-ROI2, BA 9 and 46, and f-ROI3, BA45 and BA47, which also likely covers the IFG ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Erol J. Ozmeral Reviewer #2: Yes: Adrian KC Lee, ScD 14 Apr 2022 PONE-D-21-30090R1 Effects of degraded speech processing and binaural unmasking investigated using functional near-infrared spectroscopy (fNIRS) Dear Dr. Zhou: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Andrew R Dykstra Academic Editor PLOS ONE

77 in total

Review 1. Mapping brain asymmetry.

Authors: Arthur W Toga; Paul M Thompson
Journal: Nat Rev Neurosci Date: 2003-01 Impact factor: 34.870

2. Dorsolateral prefrontal contributions to human working memory.

Authors: Aron K Barbey; Michael Koenigs; Jordan Grafman
Journal: Cortex Date: 2012-06-16 Impact factor: 4.027

3. Pupil response as an indication of effortful listening: the influence of sentence intelligibility.

Authors: Adriana A Zekveld; Sophia E Kramer; Joost M Festen
Journal: Ear Hear Date: 2010-08 Impact factor: 3.570

4. Wavelet based motion artifact removal for Functional Near Infrared Spectroscopy.

Authors: Behnam Molavi; Guy A Dumont
Journal: Annu Int Conf IEEE Eng Med Biol Soc Date: 2010

5. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest.

Authors: Rahul S Desikan; Florent Ségonne; Bruce Fischl; Brian T Quinn; Bradford C Dickerson; Deborah Blacker; Randy L Buckner; Anders M Dale; R Paul Maguire; Bradley T Hyman; Marilyn S Albert; Ronald J Killiany
Journal: Neuroimage Date: 2006-03-10 Impact factor: 6.556

6. Brain stem and cortical mechanisms underlying the binaural masking level difference in humans: an auditory steady-state response study.

Authors: Winnie Y S Wong; David R Stapells
Journal: Ear Hear Date: 2004-02 Impact factor: 3.570

7. A temporal comparison of BOLD, ASL, and NIRS hemodynamic responses to motor stimuli in adult humans.

Authors: T J Huppert; R D Hoge; S G Diamond; M A Franceschini; D A Boas
Journal: Neuroimage Date: 2005-11-21 Impact factor: 6.556

8. Hearing Impairment and Cognitive Energy: The Framework for Understanding Effortful Listening (FUEL).

Authors: M Kathleen Pichora-Fuller; Sophia E Kramer; Mark A Eckert; Brent Edwards; Benjamin W Y Hornsby; Larry E Humes; Ulrike Lemke; Thomas Lunner; Mohan Matthen; Carol L Mackersie; Graham Naylor; Natalie A Phillips; Michael Richter; Mary Rudner; Mitchell S Sommers; Kelly L Tremblay; Arthur Wingfield
Journal: Ear Hear Date: 2016 Jul-Aug Impact factor: 3.570

9. Neural Correlates of the Binaural Masking Level Difference in Human Frequency-Following Responses.

Authors: Christopher G Clinard; Sarah L Hodgson; Mary Ellen Scherer
Journal: J Assoc Res Otolaryngol Date: 2016-11-28

10. The neural substrate for binaural masking level differences in the auditory cortex.

Authors: Heather J Gilbert; Trevor M Shackleton; Katrin Krumbholz; Alan R Palmer
Journal: J Neurosci Date: 2015-01-07 Impact factor: 6.167

1 in total

1. Side-of-Implantation Effect on Functional Asymmetry in the Auditory Cortex of Single-Sided Deaf Cochlear-Implant Users.

Authors: Anna Weglage; Verena Müller; Natalie Layer; Khaled H A Abdel-Latif; Ruth Lang-Roth; Martin Walger; Pascale Sandmann
Journal: Brain Topogr Date: 2022-06-07 Impact factor: 4.275

1 in total