Literature DB >> 35271608

Effects of word familiarity and receptive vocabulary size on speech-in-noise recognition among young adults with normal hearing.

Meredith D Braza^1,2, Heather L Porter², Emily Buss¹, Lauren Calandruccio³, Ryan W McCreery², Lori J Leibold².

Abstract

Having a large receptive vocabulary benefits speech-in-noise recognition for young children, though this is not always the case for older children or adults. These observations could indicate that effects of receptive vocabulary size on speech-in-noise recognition differ depending on familiarity of the target words, with effects observed only for more recently acquired and less frequent words. Two experiments were conducted to evaluate effects of vocabulary size on open-set speech-in-noise recognition for adults with normal hearing. Targets were words acquired at 4, 9, 12 and 15 years of age, and they were presented at signal-to-noise ratios (SNRs) of -5 and -7 dB. Percent correct scores tended to fall with increasing age of acquisition (AoA), with the caveat that performance at -7 dB SNR was better for words acquired at 9 years of age than earlier- or later-acquired words. Similar results were obtained whether the AoA of the target words was blocked or mixed across trials. Differences in word duration appear to account for nonmonotonic effects of AoA. For all conditions, a positive correlation was observed between recognition and vocabulary size irrespective of target word AoA, indicating that effects of vocabulary size are not limited to recently acquired words. This dataset does not support differential assessment of AoA, lexical frequency, and other stimulus features known to affect lexical access.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35271608 PMCID： PMC8912124 DOI： 10.1371/journal.pone.0264581

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

One challenge for understanding group differences in the ability to recognize speech in noise is that there are often substantial individual differences in performance even among young adults with normal hearing. Listeners can differ with respect to linguistic, cognitive, and perceptual abilities, resulting in differences in lexical access speed, verbal working memory, rhythm perception, inhibition, and vocabulary size [1, 2]. However, the influence of listener factors on speech-in-noise recognition appears to depend on characteristics of the stimuli used to evaluate performance. These stimulus characteristics include age of acquisition (AoA), lexical frequency, and phonotactic probability of the target speech materials, as well as signal-to-noise ratio (SNR) and masker type [3-6]. Beyond their contribution to accuracy, speech-related and listener-related factors also modulate the processing cost associated with perceiving speech in noise [7]. The relationship between a listener’s receptive vocabulary knowledge and their ability to recognize speech in noise has been of particular interest to auditory researchers [3, 8–10]. An association between vocabulary size and speech-in-noise recognition has been observed for a range of target stimuli including meaningful sentences [11], semantically anomalous sentences [3, 12, 13], and isolated words [14]. A positive correlation between vocabulary size and speech recognition performance has been observed for children and adults who are hard of hearing [2, 15, 16], and for young school-age children with normal hearing [2, 15–17], this association is not always observed for adults [18] or for older school-age children with normal hearing [17]. Additional evidence supporting a link between receptive vocabulary and speech recognition in adverse listening conditions comes from studies investigating adults’ understanding of speech produced by a talker with a speech disorder or with an unfamiliar dialect or accent [12, 19]. For example, Banks and colleagues evaluated sentence recognition in speech-shaped noise using an adaptive tracking procedure [19]. Participants were young adults with normal hearing who were native speakers of British English. Sentences produced by a native speaker of British English were digitally manipulated to create an unfamiliar accent. Vocabulary scores were negatively associated with speech recognition thresholds (SRTs); adults with larger vocabularies tended to have lower SRTs than adults with smaller vocabularies. Based on the results of these studies and those described above, it has been suggested that individual differences in language experience and/or lexical processing abilities contribute to the substantial variability in performance observed across listeners for a wide range of challenging listening conditions [9, 12, 19]. It has been posited that individual differences in knowledge of linguistic structure mediates the relationship between vocabulary size and speech-in-noise recognition [3, 20]. By this account, listeners vary in their implicit language knowledge (e.g., statistical properties of speech), which supports reconstruction of target speech that has been degraded by competing noise [21]. Support for this view was provided by Fletcher and colleagues, who showed a positive relationship between vocabulary size and speech-in-noise recognition at a range of SNRs in adults with normal hearing [3]. Of particular interest, the strength of this association varied nonmonotonically with SNR. The effect was largest at 1 dB SNR, corresponding to an average of approximately 67% correct recognition. A weaker association was observed at a more advantageous SNR of 4 dB (81% correct) and at less advantageous SNRs of -2 dB (45% correct) and -5 dB (20% correct). The authors suggested this nonmonotonic pattern of results reflected maximal benefit of implicit language knowledge when speech cues are present but moderately degraded by background noise. The effect of receptive vocabulary on speech-in-noise recognition may also depend on the listener’s familiarity with the target speech stimuli used to evaluate masked speech recognition abilities, where greater familiarity is associated with higher lexical frequency and earlier age of acquisition [14]. Specifically, listeners may require less acoustic-phonetic information to recognize familiar words than unfamiliar words. School-age children with larger vocabularies tend to perform better on speech-in-noise tasks than children with smaller vocabularies when target stimuli are later-acquired words, but this relationship is not always observed when the target stimuli are early-acquired words [6, 17]. It is not clear whether this pattern of results extends to adults. When listening to speech in the presence of background noise, language demands for adults may be less than for children who are still developing early language skills. Early-acquired words tend to occur more frequently in spoken language relative to late-acquired words [22], and we know that lexical frequency affects recognition in both adults [3] and children [22]. Lexical frequency may therefore play a role in recognition of early- and later-acquired words. Understanding the relationship between receptive vocabulary and speech-in-noise recognition when target words differ by AoA and lexical frequency may help us interpret individual differences in results from speech-in-noise testing obtained in research and clinical settings. The purpose of the present study was therefore to further our understanding of the effects of listener and stimulus factors on speech-in-noise recognition. Receptive vocabulary size and percent correct recognition of words presented in speech-shaped noise were estimated for adults with normal hearing. Performance was evaluated at two SNRs using words with an average AoA of 4, 9, 12, or 15 years. The motivation for manipulating SNR was to increase opportunities for observing effects of linguistic knowledge, which may vary with task difficulty [3]. For Experiment 1, word lists were mixed with respect to AoA. For Experiment 2, lists were either mixed or blocked by AoA to see if listener expectation regarding AoA affected performance. Experiment 3 evaluated data collected in the previous two experiments to evaluate possible effects of word duration. Two main predictions were made. First, noise-masked word recognition was predicted to be better for early-acquired words than later-acquired words. Second, based on previous findings in children [6, 17], the strength of the association between receptive vocabulary size and percent-correct masked word recognition was predicted to be greater for later-acquired words than early-acquired words.

General methods

Participants

Participants were native speakers of American English between 19 and 50 years of age (mean = 31 years), all with self-reported normal hearing and no history of hearing loss. Recruitment was based on word of mouth and included people within social networks of lab staff. Each participant completed testing remotely in a single session lasting about 80 minutes, and none participated in more than one experiment. Participants were reimbursed $15/hour in electronic gift cards. All procedures were approved by the Boys Town National Research Hospital Institutional Review Board (IRB). All participants self-reported that they had normal hearing. This remote study was conducted shortly after onset of the COVID-19 pandemic, when few valid remote screening instruments were readily available. Of the participants tested, 43% had documented normal hearing from recent participation in our in-lab studies and/or from being a test subject in the UNC Doctor of Audiology Program. Potential participants were excluded if they reported one or more ear infections in the past year or a history of hearing loss. While we cannot definitively rule out hearing loss in this cohort, concern over this possibility is tempered by the fact that effects of interest were evaluated within subjects.

Remote testing procedures

For each experiment, participants completed two primary tasks: 1) receptive vocabulary testing, and 2) speech-in-noise testing. The order of these tasks was randomly assigned for each participant. Consent was obtained via a secure Webex video conference call. When it was time for the participant to sign/date/fill out forms, the first author transferred keyboard and mouse control to the participant. Following the consent process, the participant received a remote testing kit which contained: 1) a tablet computer with a touchscreen and charger, 2) Sennheiser HD 25 –II headphones, 3) instructions including specification of test order for that participant, and 4) alcohol wipes. The kit was dropped off to the participant’s door, and kits were cleaned with alcohol wipes between use. Instructions included strategies for ensuring a quiet test environment (e.g., turning off the TV, limiting distractions from family members and pets), when to take breaks, who to contact with questions, how to run each program, the steps for completing each task, and what to do when the tasks were completed. The first author was available to assist via videoconference if participants had questions as they were testing.

Receptive vocabulary assessment

The Peabody Picture Vocabulary Test, Fourth Edition (PPVT-4) [23] was used to quantify receptive vocabulary. Stimuli were recorded in a sound booth by a 49-year-old male talker who is a native speaker of American English. Stimuli were presented via headphones at a comfortable listening level using a custom software program preinstalled on the tablet. In each trial, participants heard a word preceded by the carrier phrase, “Show me,” and then identified the word by selecting the associated illustration on the touchscreen. Each trial was associated with four illustrations, scanned in from a hard copy version of the PPVT-4, including the correct response and three foils. Prior to beginning the task, participants completed a practice trial to ensure that they understood how to perform the task. Trial-by-trial data and summary scores were uploaded automatically to REDCap, a secure web app for storing and managing data [24], following completion of the task.

Speech-in-noise testing

Target stimuli were drawn from a corpus of 240 disyllabic words, selected based on their AoA [25]. There were 60 words in each of four categories based on mean AoA: those acquired at 4 years of age (4.0–4.7 yrs, mean 4.4 yrs), 9 years of age (9.0–9.2 yrs, mean 9.1 yrs), 12 years of age (12.0–12.2 yrs, mean 12.1 yrs), and 15 years of age (14.4–15.9 yrs, mean 15.0 yrs). As expected, these four lists also differed with respect to lexical frequency. Fig 1 shows lexical frequency on a log scale, based on Brysbaert and New [26], plotted as a function of AoA for the four wordlists. One-tailed Welch’s t-tests indicate that the log transformation of lexical frequency fell with increasing AoA across all four lists (4 vs. 9 yrs, t(116) = 9.82, p < .001; 9 vs. 12 yrs, t(112) = 2.54, p = .006; 12 vs. 15 yrs, t(117) = 1.89, p = .030). The four lists were balanced with respect to phonetic content. One- and two-phoneme probabilities were computed based on the Phonotactic Probability Calculator of Vitevitch and Luce [27]; those probabilities were not significantly different across lists (p ≤ .223, uncorrected). Target words are reported in S1 Table.

Fig 1

Lexical frequency as a function of age of acquisition.

Lexical frequency as a function of age of acquisition.

Symbol fill indicates the AoA category, as defined in the legend. Boxplots indicate the distribution of lexical frequency for each AoA category. Horizontal lines indicate the median, and boxes span the 25th to 75th percentiles. Target words were produced by a 28-year-old male talker who is a native speaker of American English with no noticeable regional accent; productions were recorded inside a double-walled sound booth. Recordings were made with the talker’s mouth positioned approximately 6 inches in front of a cardioid condenser microphone (Shure KSM 42 cardioid condenser). The talker was recorded saying the carrier phrase, “Say the word,” prior to each target word. Recordings were made using a TwinFinity 710 preamplifier, M-Audio FastTrack Pro audio interface, and Logic-Pro-X recording software. Recorded words were then edited using Sound Studio to remove all silent periods before and after the recording. The individual files were scaled to equivalent root-mean-square level using MATLAB [28]. Target recordings ranged in duration from 1.2 to 2.4 seconds (mean of 1.8 sec), including the carrier phrase. The task was open-set word recognition in the presence of speech-shaped noise. Stimuli were presented diotically over headphones. The speech-shaped noise was generated based on the long-term average speech spectrum of the targets, including the carrier phrase. Custom software running on the tablet computer presented stimuli through the onboard soundcard and collected participants’ responses. The target was temporally centered in a 3.4-second sample of masking noise, resulting in at least 500 ms of noise alone at the beginning and end of each stimulus presentation. The fidelity of sound from the onboard soundcard was evaluated by playing a test stimulus comprising a sequence of 1-sec pure tones, with 20-ms onset and offset ramps. The first seven tones were at octave frequencies 125–8000 Hz, all presented at 75 dB SPL. The final four tones were at 1000 Hz, presented at 75, 65, 55, and 45 dB SPL. The output of the soundcard was routed to an oscilloscope. Visual inspection of the output did not reveal any frequency shaping or amplitude compression associated with the soundcard drivers. Prior to testing, participants heard a passage produced by the target talker, and they were asked to adjust the volume to a comfortable listening level. Participants were explicitly instructed not to adjust the volume again for the remainder of the experiment. Upon equipment drop-off for each participant, volume on the tablet was set to 50%, which corresponded to approximately 75 dB SPL at the headphones, as measured using a 6-cc coupler and precision sound level meter (Larson Davis Model 824). Based on self-report, very few participants adjusted the volume from this pre-set level. Next, participants listened to instructions recorded by the target talker. The instructions described the task and directed the listener to ignore the background noise and listen to the target talker. Participants were also instructed to repeat each word they heard aloud following each trial. Following each verbal response, participants were instructed to type their response into the response box. Data were automatically uploaded to REDCap [24].

Data analysis

Linear regression models were used to evaluate logit-transformed percent correct data, and logistic regression was used to evaluate trial-by-trial responses. Scores on the PPVT and logit-transformed word scores were represented as z-scores to facilitate comparison of effect sizes. All models included a random intercept for each participant. T-tests were evaluated two-tailed unless otherwise indicated, with a significance criterion of ⍺ = .05. The Akaike Information Criterion (AIC) and Chi-square tests were used to compare alternative models.

Experiment 1

Twenty-seven participants (9 males) completed Experiment 1. The average age of these participants was 32 years (20–49 yrs). For speech-in-noise testing, each participant heard four lists of 60 words. Each list contained an equal number of words from each AoA category (15 words with AoA of 4 yrs, 9 yrs, 12 yrs, and 15 yrs). Two lists were played at -5 dB SNR, and two lists were played at -7 dB SNR. The lists for each SNR and list order were randomly assigned for each participant.

Results and discussion

Panels A1 and A2 of Fig 2 show the distribution of word recognition scores across participants for each AoA (Panel A1) and as a function of receptive vocabulary (PPVT score; Panel A2). Performance was better at -5 dB SNR than -7 dB SNR, and performance tended to be better for early-acquired words than later-acquired words. The one deviation from this trend was observed for early-acquired words presented at -7 dB SNR, where there was an unexpected trend for poorer performance for words with an AoA of 4 years than 9 years. These trends were evaluated with a linear regression model, with SNR and AoA represented as categorical factors. Participants’ PPVT scores were included as a continuous variable. Results are reported in Table 1. There were significant effects of receptive vocabulary (p = .051) and SNR (p < .001). At -5 dB SNR, there was no difference between performance with AoA of 4 years compared to 9 years (p = .443), but there were differences compared to 12 and 15 years (p < .001). There was a significant interaction between SNR and AoA of 15 years (p = .026) and a non-significant trend for an interaction between SNR and AoA of 9 years (p = .068). Two-tailed paired t-tests for the -7 dB SNR data indicate that performance was worse for AoA of 4 years than 9 years (p < .001), better for 9 years than 12 years (p < .001), and not significantly different for 12 years and 15 years (p = .621). Interactions between receptive vocabulary and AoA were not significant (p ≥ .202). This result fails to support the hypothesis that effects of vocabulary size depend on AoA, although for later-acquired words there were greater mean beneficial effects of having a larger vocabulary.

Fig 2

Word scores by AoA and PPVT scores.

Table 1

Linear mixed model for results of Experiment 1.

	Value	SEM	DF	t-value	p-value
(Intercept)	1.029	0.122	179	8.45	< .001
PPVT	0.250	0.122	25	2.05	.051
AoA(9yr)	0.107	0.139	179	0.77	.443
AoA(12yr)	-0.501	0.139	179	-3.61	< .001
AoA(15yr)	-0.928	0.139	179	-6.69	< .001
SNR(-7)	-1.336	0.139	179	-9.63	< .001
PPVT x AoA(9yr)	0.050	0.120	179	0.42	.674
PPVT x AoA(12yr)	0.043	0.120	179	0.36	.723
PPVT x AoA(15yr)	0.153	0.120	179	1.28	.202
AoA(9yr) x SNR(-7)	0.360	0.196	179	1.84	.068
AoA(12yr) x SNR(-7)	0.083	0.196	179	0.42	.674
AoA(15yr) x SNR(-7)	0.440	0.196	179	2.24	.026

This analysis includes fixed effects of PPVT score (z-score), AoA (reference = 4 yr), and SNR (reference = -5 dB). Interactions in the model included PPVT x AoA and AoA x SNR. Each row contains information about a parameter or the interaction between parameters, with factor levels indicated in parentheses.

Word scores by AoA and PPVT scores.

Panels A1 and B1 show the distribution of word scores, plotted as a function of AoA. The horizontal lines indicate the medians, boxes span the 25th to 75th percentiles, and whiskers span the 10th to 90th percentiles. Panels A2 and B2 show the mean SRT for individual listeners, plotted as a function of their receptive vocabulary (PPVT score). Lines indicate the association between percent correct and receptive vocabulary. Box and symbol fill reflects SNR (A1 & A2, Exp 1) or mixed vs. blocked wordlists (B1 & B2, Exp 2), as defined in the legend. All testing in Exp 2 was conducted at -7 dB SNR. This analysis includes fixed effects of PPVT score (z-score), AoA (reference = 4 yr), and SNR (reference = -5 dB). Interactions in the model included PPVT x AoA and AoA x SNR. Each row contains information about a parameter or the interaction between parameters, with factor levels indicated in parentheses. One question remaining at the end of Experiment 1 was whether the trend for a nonmonotonic effect of AoA on word recognition at -7 dB SNR was a chance finding, or if this effect could be replicated. A nonmonotonic effect of AoA at -7 dB SNR was not anticipated at the outset, but one post-hoc explanation for this result has to do with listener expectation. Adults tested in this protocol might have prior expectations regarding the type of target words they were listening for, and those expectations could in turn affect performance, resulting in relatively better performance for words that conformed to those expectations as compared to those that deviated from expectations. If participants were expecting to hear later-acquired words, this could result in relatively poor performance for words with a mean AoA of 4 years. It is not clear why listeners would form such an expectation, but the inclusion of a vocabulary assessment in the test protocol might be taken as evidence that this experiment was about recognition of more advanced vocabulary. This possibility was evaluated in Experiment 2 by providing participants with information about the types of target words to expect in select blocks of trials.

Experiment 2

Twenty participants (7 males) completed Experiment 2. The average age of these participants was 30 years (20–40 yrs). Participants heard six lists of words, all played at -7 dB SNR. The first two lists contained 60 words, each with an equal number of words from the four AoA categories, as in Experiment 1. The other four lists contained 30 words each and were blocked by AoA, such that each list contained words from only one of the AoA categories. Participants heard the two mixed lists first (Lists A and B, order randomized) followed by the four blocked lists (4, 9, 12, and 15 yrs AoA, order randomized). Prior to each blocked list, participants saw three words on the computer monitor exemplifying the AoA of the subsequent list; those example words had been previously considered for inclusion in the set of 240 targets but were not ultimately chosen for inclusion in that corpus. Other aspects of the stimuli and test procedures were the same as described for Experiment 1. Experiment 2 was conducted to determine whether the nonlinear effect of AoA at -7 dB SNR observed in Experiment 1 could be replicated with a new cohort of participants, and to evaluate the role of listener expectation on performance. The right column of panels in Fig 2 shows the distribution of word scores across listeners for each AoA (Panel B1) and as a function of receptive vocabulary (PPVT score; Panel B2). Performance was similar for the mixed and blocked trials, and the effect of AoA was consistent with the -7 dB data from Experiment 1. This was confirmed with a linear regression model, with AoA and predictability of AoA (mixed vs. blocked) represented as categorical factors. PPVT scores were subjected to a z-transform and included as a continuous variable. Results appear in Table 2. There was a significant effect of receptive vocabulary (p = .008). The effect of AoA predictability (mixed vs. blocked) was not significant (p = .322). For the mixed condition, performance was significantly worse for AoA of 4 years than 9 years (p = .003), and significantly better for AoA of 4 years than either 12 years (p = .017) and 15 years (p = .001). There was no interaction between predictability and AoA (p ≥ .506). Given the lack of an effect of predictability, the mixed and blocked data were averaged, and the result was used to evaluate the effect of AoA via two-tailed paired t-tests. As observed in Experiment 1, performance was worse for an AoA of 4 years than 9 years (p < .001), better for an AoA of 9 years than 12 years (p < .001), and no difference was observed for an AoA of 12 years and 15 years (p = .516).

Table 2

Linear mixed model for results of Experiment 2.

	Value	SEM	DF	t-value	p-value
(Intercept)	-0.396	0.151	133	-2.63	.010
PPVT	0.219	0.073	18	3.00	.008
Pred(Blocked)	0.184	0.185	133	1.00	.322
AoA(9yr)	0.553	0.185	133	2.99	.003
AoA(12yr)	-0.447	0.185	133	-2.42	.017
AoA(15yr)	-0.631	0.185	133	-3.42	.001
Pred(Blocked) x AoA(9yr)	-0.174	0.261	133	-0.67	.506
Pred(Blocked) x AoA(12yr)	-0.121	0.261	133	-0.46	.644
Pred(Blocked) x AoA(15yr)	0.085	0.261	133	0.33	.744

This analysis includes fixed effects of PPVT score (z-score), AoA predictability (reference = mixed) and AoA (reference = 4 yr), as well as the interaction between predictability and AoA. Each row contains information about a parameter or the interaction between parameters, with factor levels indicated in parentheses. This experiment confirmed the nonmonotonic effect of AoA on word recognition for the -7 dB SNR presentation level for both fixed and blocked presentation, but it did not offer any possible explanation for that result. One consideration is whether the word lists used in the present set of experiments differed in ways that could have affected performance apart from differences in AoA. While the four word lists were balanced for one- and two-phoneme probabilities, there are many other factors that were not explicitly controlled and may have differed. Experiment 3 evaluated one such feature, target word duration.

Experiment 3

The final experiment evaluated whether target word duration affects recognition for this stimulus set. Stimulus duration was evaluated by manually marking target word boundaries for each recording. The geometric mean of duration was 660 ms for an AoA of 4 years (IQR: 584–760 ms), 805 ms for an AoA of 9 years (IQR: 708–934 ms), 781 ms for an AoA of 12 years (IQR: 671–903), and 739 ms for an AoA of 15 years (IQR: 663–828 ms). Using uncorrected two-tailed t-tests, the 4-year words were significantly shorter than any of the other three AoA categories (p ≤ .009). The 9-year words were significantly longer than the 15-year words (p = .006) but not the 12-year words (p = .422). There was a non-significant trend for 12-year words to be longer than 15-year words (p = .060). Differences in duration across lists mirror the nonmonotonic pattern of performance as a function of AoA observed in the previous two experiments for the -7 dB SNR presentation level. To evaluate the possible role of target word duration in performance, data were combined for the -7 dB SNR conditions in Experiment 1 and the mixed conditions of Experiment 2. This combined dataset included 49 participants. An analysis of trial-by-trial data was conducted using logistic regression, with random effects of subject and word. Fixed effects were the log transform of target word duration in ms, AoA (a categorical variable), PPVT (a continuous variable), and the interaction between PPVT and AoA. The results of this model are shown in Table 3. This model indicates a significant effect of PPVT (p = .044) and a significant effect of word duration (p = .019). Performance for words acquired at 4 and 9 years of age was not significantly different when word duration was included in the model (p = .507). This result is consistent with the idea that the longer duration of words acquired at 9 years of age could be responsible for the nonmonotonic performance as a function of AoA. In contrast to previous analyses, worsening performance for words acquired at 12 and 15 years of age approached, but did not reach significance in this analysis (p ≥ .055). One caveat is that evaluating AoA as a categorical variable does not capture the ordered prediction associated with increasing AoA (e.g., that effects of AoA for 12-year words should be intermediate between 9-year and 15-year words). A second model with log of lexical frequency (a continuous variable) in place of AoA (a categorical variable) resulted in a reduction of the AIC and a non-significant change in model fit (X2(4) = 2.63, p = .620). This result indicates that we cannot differentiate between effects of AoA and effects of lexical frequency in this dataset.

Table 3

Linear mixed model for combined data from Experiment 1 and 2, collected using -7 dB SNR level and mixed presentation.

	Value	SEM	z-value	p-value
(Intercept)	-6.897	2.767	-2.493	0.013
PPVT	0.196	0.097	2.015	0.044
AoA(9yr)	0.263	0.397	0.663	0.507
AoA(12yr)	-0.678	0.392	-1.728	0.084
AoA(15yr)	-0.738	0.384	-1.921	0.055
Duration	1.546	0.657	2.353	0.019
PPVT x AoA(9yr)	0.116	0.105	1.100	0.272
PPVT x AoA(12yr)	0.115	0.107	1.081	0.280
PPVT x AoA(15yr)	0.069	0.106	0.651	0.515

This analysis includes fixed effects of target word duration in ms, PPVT score (z-score), AoA (reference = 4 yr), and the PPVT x AoA interaction. Each row contains information about a parameter or the interaction between parameters, with factor levels indicated in parentheses. Evidence that differences in word duration are responsible for the nonmonotonic effect of AoA observed in Experiments 1 and 2 raises the question of whether these differences are particular to our stimulus set or whether they are representative of AoA-related differences in word duration inherent in the language. Text-to-speech synthesis was used to address this question, implemented in MATLAB [29]. Each word was synthesized separately. The resulting wav files included variable-duration segments of silence before and after the synthesized speech. To replicate the manual splicing used to evaluate stimulus recordings, the envelope was extracted via full-wave rectification and low-pass filtering twice with a 4th order 40-Hz Butterworth (once forward and once backward). The first and last time point that was ≥ 40 dB down from the peak was used to define the beginning and end of the word, respectively. The first step was to confirm that synthesized speech replicated the mean differences in list duration for the target words. The geometric mean of durations by list were 674 ms for an AoA of 4 years (IQR: 564–794 ms), 767 ms for an AoA of 9 years (IQR: 718–864 ms), 710 ms for an AoA of 12 years (IQR: 629–816 ms), and 697 ms for an AoA of 15 years (IQR: 621–781 ms). Across lists, the duration estimated using the text-to-speech algorithm was 4.6% greater than the duration of the recorded stimuli, and the correlation between these values was r = .63 (p < .001). These results suggest that the differences in duration as a function of AoA for the recorded speech stimuli are due in part to phonetic features of the target words, and that text-to-speech synthesis can be used to characterize these differences. The second step was to estimate duration as a function of AoA for a larger set of two syllable words in the Kuperman database. An open-source dictionary maintained by Carnegie Mellon University (CMUDict [30]) was used to define the number of syllables. That corpus contains 133,779 words, of which 61,468 are pronounced as disyllabic. Cross-referencing the two-syllable words from the CMUDict database with the Kuperman database identified 8,381 two-syllable words. These words were synthesized, and duration was estimated using the methods described above. Fig 3 shows estimates of word duration plotted as a function of AoA, with distributions of both parameters indicated in the margins. The solid red line indicates a three-parameter polynomial fit to the natural log transform of duration (y = -1.553e-3 · x2 + 3.684e-2 · x– 5.867e-1). All three of these parameters were significantly different from zero. This function predicts increases in duration for words with an AoA up to ~12 yrs, and a modest trend for decreasing duration with further increases in AoA. For the AoA categories in the present experiment, this corresponds to a 55-ms increase in duration between AoAs of 4 to 9 years and a 2-ms decrease between AoAs of 9 and 15 years. These changes are smaller than those observed in recorded stimuli (145 ms increase and 66 ms decrease, respectively), suggesting that the magnitude of differences in word duration for stimuli used in Experiments 1 and 2 may not be representative of all two-syllable words with the associated AoAs.

Fig 3

Word duration in sec as a function of AoA.

Word duration was estimated using text-to-speech synthesis for two-syllable words from the Kuperman database. Those estimates of duration in seconds are plotted as a function of AoA in years, shown with grey dots. A three-parameter line fit is shown in red, indicating modest increase in duration with increasing AoA up to 12 years, followed by modest reductions in duration thereafter. The distributions of duration and AoA are indicated in the right and top margins, respectively.

Word duration in sec as a function of AoA.

General discussion

The purpose of this study was to evaluate effects of word familiarity, as indexed by AoA, and individual differences in receptive vocabulary on word recognition in noise for young adults with normal hearing. There were two a priori predictions. The first prediction was that participants would perform more poorly on the speech-in-noise task for later-acquired words with a lower lexical frequency than for early-acquired words with a higher lexical frequency. The second prediction was that the strength of the association between vocabulary size and speech-in-noise recognition scores would depend on word familiarity, with larger effects for less familiar words (i.e., those with later AoA and lower lexical frequency). Target word AoA and lexical frequency tend to be correlated ([31]), and that association was clear in the current stimulus set. Therefore, whereas the word sets were selected based on AoA, and are described as such in the discussion that follows, it is not clear whether AoA, lexical frequency, word duration, or a combination of those factors are responsible for the effects observed. The prediction that word recognition in noise would be better for early-acquired words than late-acquired words was evaluated by comparing percent-correct speech recognition scores at two different SNRs for words with an average AoA of 4, 9, 12, or 15 years. The findings were in partial agreement with this prediction. Overall word recognition performance was better at -5 dB relative to -7 dB SNR, and percent-correct scores tended to decrease as AoA increased. This pattern of results is consistent with prior findings showing an association between word familiarity and masked speech recognition [3, 14, 32, 33]. For example, Savin [31] estimated speech-in-noise thresholds in a group of young adults using words that spanned a wide range with respect to frequency of occurrence and word length [33]. For relatively short words (e.g., monosyllabic and disyllabic words), SRTs were lower for frequently occurring words than infrequently occurring words. Although there was a general trend for speech-in-noise scores to decrease with increasing AoA, better performance for words with an average AoA of 9 years than 4 years was observed at -7 dB SNR. This pattern of results was not observed at the more advantageous SNR of -5 dB; similar speech-in-noise performance was observed for words acquired at 4 and 9 years at -5 dB SNR. The nonmonotonic effect of AoA at -7 dB SNR was observed for mixed and fixed blocks, inconsistent with the idea that listener expectation was responsible for the higher masked speech recognition scores observed for the 9- versus 4-year-old words. Recall that Fletcher and colleagues also observed a level effect in the relationship between AoA and word recognition [3]. However, the details of that effect were not the same as observed in the present study. Whereas Fletcher and colleagues observed a larger effect of AoA at a moderate SNR (corresponding to ~67% correct performance), we observed different effects for words acquired at 4 and 9 years of age despite similar percent correct scores at -5 dB SNR. One potential explanation for the unexpected difference in performance between words acquired at 4- and 9-years at the more challenge SNR is that that the 9-year-old words were longer than the 4-year-old words. An analysis of target word duration determined that words acquired at 4 years of age tended to be shorter than those acquired later, and words acquired at 9 years tended to be longer than those acquired at 15 years of age. Including word duration in a statistical model of trial-by-trial responses indicates no significant difference between recognition of words acquired at 4 years and 9 years of age. This suggests that the nonmonotonic effect of AoA observed in mean data could be due to differences in word duration across lists. Better recognition of longer words could be related to prior data indicating better performance for slower speaking rates and for words with more syllables [33, 34]. Analysis of a larger set of two-syllable words suggests that there is a nonmonotonic pattern of word duration as a function of AoA in English, but that the magnitude of duration differences observed with the stimuli used for Experiments 1 and 2 is larger than expected based on this analysis. A major goal of this study was to evaluate the relationship between receptive vocabulary and masked speech recognition, considering the AoA of target words used to assess speech-in-noise performance. We anticipated a greater positive correlation between receptive vocabulary and percent-correct word recognition scores for later-acquired words relative to early-acquired words. This prediction was based on prior studies investigating adults’ speech perception abilities under adverse listening conditions [3, 12, 19], and on studies of masked speech recognition in both children with normal hearing and children who are hard of hearing [6, 17]. In these studies, target stimuli were selected to fall within the lexicon of the youngest listeners. Thus, one explanation offered for the lack of an association between receptive vocabulary size and masked speech recognition performance for older children and adults in the earlier studies was that having a large receptive vocabulary is less beneficial for speech-in-noise recognition when using highly familiar target speech [6, 17]. In contrast to our initial prediction, there was a comparable positive relationship between receptive vocabulary size and masked word recognition performance for all four AoA categories and for both SNRs examined. While unexpected, these results are in agreement with findings from previous studies in which young adults with larger vocabularies showed an advantage when listening in noise relative to young adults with smaller vocabularies [2, 13]. One implication of these results is that limiting the test corpus to early-acquired words may not reduce effects of linguistic knowledge, as is often assumed when testing speech perception in a clinical setting. Whereas the present study found a consistent effect of vocabulary size on young adults’ recognition of early- and later-acquired target words, several previous studies have reported effects of vocabulary size only for later-acquired targets [6, 17]. One potential explanation for discrepancies observed across studies was suggested over 50 years ago by Savin [33], who examined effects of lexical frequency on adults’ word recognition in noise using words that varied in length from 1 to 8 syllables. While a strong association between lexical frequency and speech-in-noise performance was observed for short words, almost no effect of lexical frequency was observed for longer words. Savin posited that listeners hear each syllable in a long word in the context of the other syllables, improving speech recognition performance in a manner similar to that observed for words in semantically meaningful sentences [35]. Most previous studies that failed to show a consistent association between receptive vocabulary size and speech-in-noise recognition used sentence-length materials [6, 17], which might tend to underestimate the effects of vocabulary size because of the availability of sentence-level semantic and syntactic cues. The current study had several limitations that could be addressed in future studies. Testing participants with wider variability in receptive vocabulary size could help generalize the role of receptive vocabulary on speech-in-noise performance to a broader population of listeners. Participants in this study had mean PPVT scores ranging from 198 to 225 (mean = 214), corresponding to the 30th to 99th percentiles for this age group (mean = 72.5th percentile). Measuring percent correct word recognition based on keywords in sentences rather than single words could shed light on the conditions under which effects of receptive vocabulary are observed. Finally, the present study focused on word familiarity, as indexed by AoA, but it is unclear whether familiarity is the dominant factor responsible for better recognition of early-acquired words. There is a rich literature in psycholinguistics on the differential contributions of plasticity, cumulative word frequency, and semantic structure of lexical representation (reviewed by [36]), as well as effects of phonetic similarity across words and semantic richness [31, 37]. It is likely that selecting test words based on AoA affects results via multiple factors [38]. More detailed characterization of the test stimuli in future studies of masked speech recognition could help clarify the differential contributions of lexical neighborhood density, imageability, semantic richness, current and cumulative frequency, and AoA. The results of the present study contribute to the growing body of evidence that remote testing is a feasible method of gathering data on speech recognition [39-41]. Laboratory hardware was delivered to listeners’ homes, limiting inconsistencies related to the use of personal computers and headphones [39]. Remote testing options are appealing beyond the COVID-19 pandemic, particularly for those without access to traditional laboratory environments and/or when recruiting participants across a wide geographical region. One implication of this study is that word familiarity, as indexed by AoA, is an important consideration when creating and interpreting clinical tests of masked word recognition. This study demonstrates that even for young adults with relatively large vocabularies, familiar words (those acquired early and/or with higher lexical frequency) are easier to recognize than less familiar words (those acquired later and/or with lower lexical frequency). After controlling for differences in word duration, there was monotonic trend for poorer performance with increasing AoA. There also may be differing effects of AoA based on the SNR of the stimuli. As such, it may be useful to test masked word recognition using a range of target word AoAs and SNRs to get a full picture of how these factors influence speech recognition ability. Considering these individual differences can also help clinicians and researchers more accurately interpret speech-in-noise recognition results.

Target words in the four AoA categories.

Disyllabic target words acquired at 4 years, 9 years, 12 years, and 15 years of age. (DOCX) Click here for additional data file. 9 Dec 2021

PONE-D-21-35370

Effects of Word Familiarity and Receptive Vocabulary Size on Speech-in-Noise Recognition Among Young Adults with Normal Hearing

PLOS ONE Dear Dr. Leibold, Thank you for submitting your manuscript to PLOS ONE. I have reviews from one expert in the field, but have been unable to find another scientist with the required expertise to review your manuscript. I have read it myself, and decided to make a decision with the input of the one reviewer and my own assessment. It is my opinion that it would be highly unlikely that input from another reviewer would lead to a different decision, and I don't want you to have to wait any longer.

Both the reviewer and I concluded that your study was conducted in largely an appropriate manner and the manuscript itself is well written. Nonetheless, we do have recommendations for you to consider in a revision. Therefore, I invite you to submit a revised version of the manuscript after you carefully consider the points raised during the review process and make the changes that you deem appropriate. Please submit your revised manuscript by Jan 23 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Susan Nittrouer, Ph.D. Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf. 2. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide. 3. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. Additional Editor Comments: This was a reasonable study. The Introduction was especially well written. The purpose was to examine whether word familiarity and receptive vocabulary size affect speech recognition thresholds for young adults with normal hearing listening to isolated words. Although the study is fine as it is, I had two concerns, both related to the facts that I was not especially fond of the way word familiarity or vocabulary size were assessed and benchmarked. Age of acquisition is an extremely broad term and might not apply to each specific listener in the same manner; instead, age of acquisition may differ across individuals. The PPVT is a relatively insensitive measure of vocabulary size because one need only to have a low level of familiarity with a word in order to be able to select the picture that represents that word. Even measures of expressive vocabulary tap into a slightly deeper level of vocabulary knowledge than receptive vocabulary measures because the subject must retrieve the word from his/her own lexicon. David Pisoni developed a task that derives a more sensitive measure of the depth of a subject’s lexical knowledge; it is the Word Fam task. I highly recommend that task for future work. I also had a slightly different interpretation of the (weak) age-of-acquisition effects that were found. Early-acquired words – which are high frequency, as the authors suggest – may not require listeners to recover phonological structure to the same extent as later-acquired, more ‘difficult’ words. Therefore, less may be needed in terms of acoustic-phonetic information in order to recognize early acquired words. That idea might be important to interpretation of the data obtained in this study, especially if we entertain another notion often invoked when discussing speech-in-noise recognition, and that is the idea of listening in the dips. Although speech-shaped noise has few temporal or spectral dips, it does have some. Later-acquired, less-familiar words may require the listener to recover more “bits” of signal information from across the spectrum and temporal structure than earlier-acquired, more-familiar words. And listeners’ abilities to recover those signal bits and integrate them might help explain individual differences in speech-in-noise recognition. My final comment was that I found the Table 1 difficult to understand. It needs to be revised so that readers can immediately recognize what each row represents. Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The manuscript describes the results of studies of the effects of vocabulary size on open-set speech-in-noise recognition for adults with normal hearing. Data were collected using relatively standard methods, adapted to reflect Covid 19 restrictions. The general finding was that larger vocabulary size was associated with better recognition of speech in noise, consistent with some previous reports from others. A hypothesis that recognition of speech in noise would be associated with age of acquisition (or lexical frequency, with which it is correlated) was not supported. Because of Covid 19, testing was done in the listeners’ home using tablet computers equipped for that purpose. Some sort of accommodation was unavoidable, and that was a reasonable choice that is preferable in some ways to alternatives such as testing online. However, it is not a perfect solution, and I’ve mentioned a few issues in the Specific Comments. Specific comments: Line 122: It would be preferable in a study like this to confirm that the listeners had normal hearing. It seems likely that the authors relied on self-report of hearing status because testing was done remotely, but it would not have been difficult to include a simple pure-tone screening app on the tablet used for the actual study so that’s not a convincing explanation. However, screening would have required better control of absolute presentation level, which listeners were “asked to adjust” (line 197). This situation cannot be changed, but perhaps a brief justification for the choices that were made could be provided. Line 174: Was this talker the same one who recorded the vocabulary words? I don’t think that would affect the outcome, but some readers might want to know. Line 176, “The talker stood approximately 6 inches in front …”: Presumably this was intended to say that the talker’s mouth was 6 inches from the microphone? Line 194-195, “Visual inspection of the [soundcard] output did not reveal any frequency shaping or amplitude compression associated with the soundcard drivers.” This information would be more useful if the measurements had been made on the signals after they’d been transduced by the earphone; that’s where frequency shaping is most likely to be introduced. Even if those measurements were not made prior to testing, presumably they could be made after the fact. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

9 Feb 2022 Manuscript PONE-D-21-35370 Response to Reviewers Susan Nittrouer Academic Editor PLOS ONE February 9, 2022 Dear Dr. Nittrouer, Thank you for your consideration of our paper entitled “Effects of Word Familiarity and Receptive Vocabulary Size on Speech-in-Noise Recognition Among Young Adults with Normal Hearing” (PONE-D-21-35370) for PLOS ONE, and for enclosing your comments. We have carefully reviewed the comments and have revised the manuscript accordingly. The big issues we addressed were 1) justifying the choice to base participant hearing status on self-report, 2) clarifying some of the signal recording methods, and 3) and improving the readability of Tables 1 and 2. In addition to these changes, we have added an analysis of word duration, which appears to account for the nonmonotonic effect of word duration that we observed behaviorally. Our responses to the reviewer comments are given point-by-point below and our changes to the manuscript are coded in red font. We believe the revised version of the manuscript has improved based on feedback and hope that you now find it suitable for publication. Best, Meredith D. Braza Doctor of Audiology Student The University of North Carolina at Chapel Hill Response to Reviewer 1: RESP: Thank you for your review of our paper. We have responded to each of your points below. The manuscript describes the results of studies of the effects of vocabulary size on open-set speech-in-noise recognition for adults with normal hearing. Data were collected using relatively standard methods, adapted to reflect Covid 19 restrictions. The general finding was that larger vocabulary size was associated with better recognition of speech in noise, consistent with some previous reports from others. A hypothesis that recognition of speech in noise would be associated with age of acquisition (or lexical frequency, with which it is correlated) was not supported. Because of Covid 19, testing was done in the listeners’ home using tablet computers equipped for that purpose. Some sort of accommodation was unavoidable, and that was a reasonable choice that is preferable in some ways to alternatives such as testing online. However, it is not a perfect solution, and I’ve mentioned a few issues in the Specific Comments. Specific comments Line 122: It would be preferable in a study like this to confirm that the listeners had normal hearing. It seems likely that the authors relied on self-report of hearing status because testing was done remotely, but it would not have been difficult to include a simple pure-tone screening app on the tablet used for the actual study so that’s not a convincing explanation. However, screening would have required better control of absolute presentation level, which listeners were “asked to adjust” (line 197). This situation cannot be changed, but perhaps a brief justification for the choices that were made could be provided. RESP: To move forward with the consenting process, participants had to first verify that they do not believe they have hearing loss and have not had any ear infections in the past year. There is precedent in the psycholinguistics literature for using self-report as an indication of hearing status while doing supra-threshold speech perception experiments. As recruitment was based on word of mouth and included people within social networks of lab staff, many participants had documented normal hearing from participating in our labs previously and/or from being a test subject in the UNC Doctor of Audiology Program. Of the 47 participants in our study, 20 (43%) had normal hearing documented (13 participants in Experiment 1, 7 participants in Experiment 2). Concern over possible hearing loss the remaining participants is tempered by the fact that the main questions of interest in this study were based on comparison of results across conditions. This has been added to the manuscript. Line 174: Was this talker the same one who recorded the vocabulary words? I don’t think that would affect the outcome, but some readers might want to know. RESP: The talker who recorded the target words also recorded the carrier phrase, but a different talker recorded the PPVT words. This has been clarified in the manuscript by specifying each talker’s age. Line 176, “The talker stood approximately 6 inches in front …”: Presumably this was intended to say that the talker’s mouth was 6 inches from the microphone? RESP: Yes, the talker’s mouth was 6 inches from the microphone. This has been clarified in the manuscript. Line 194-195, “Visual inspection of the [soundcard] output did not reveal any frequency shaping or amplitude compression associated with the soundcard drivers.” This information would be more useful if the measurements had been made on the signals after they’d been transduced by the earphone; that’s where frequency shaping is most likely to be introduced. Even if those measurements were not made prior to testing, presumably they could be made after the fact. RESP: The goal of these measurements was to evaluate whether the soundcard was applying any signal conditioning. Many commercially available soundcard drivers modify the spectral and/or amplitude characteristics of the sound in their default configuration (e.g., bass boost or compression). Since this study relied on the soundcard of a tablet, we wanted to confirm that these features were not affecting the signal. Demonstrating linearity of the soundcard allows readers to rely on the frequency response published by the headphone manufacturer. Response to Academic Editor: RESP: Thank you for your review of our paper. We have responded to each of your points below. This was a reasonable study. The Introduction was especially well written. The purpose was to examine whether word familiarity and receptive vocabulary size affect speech recognition thresholds for young adults with normal hearing listening to isolated words. Although the study is fine as it is, I had two concerns, both related to the facts that I was not especially fond of the way word familiarity or vocabulary size were assessed and benchmarked. Age of acquisition is an extremely broad term and might not apply to each specific listener in the same manner; instead, age of acquisition may differ across individuals. The PPVT is a relatively insensitive measure of vocabulary size because one need only to have a low level of familiarity with a word in order to be able to select the picture that represents that word. Even measures of expressive vocabulary tap into a slightly deeper level of vocabulary knowledge than receptive vocabulary measures because the subject must retrieve the word from his/her own lexicon. David Pisoni developed a task that derives a more sensitive measure of the depth of a subject’s lexical knowledge; it is the Word Fam task. I highly recommend that task for future work. RESP: Thank you for the suggestion to consider using the task developed by David Pisoni in our future work. We elected to utilize the PPVT because 1) it is shown to be associated with speech perception outcomes in previous research, and 2) it could be easily adapted to a tablet-based format that was straightforward for remote participants. I also had a slightly different interpretation of the (weak) age-of-acquisition effects that were found. Early-acquired words – which are high frequency, as the authors suggest – may not require listeners to recover phonological structure to the same extent as later-acquired, more ‘difficult’ words. Therefore, less may be needed in terms of acoustic-phonetic information in order to recognize early acquired words. That idea might be important to interpretation of the data obtained in this study, especially if we entertain another notion often invoked when discussing speech-in-noise recognition, and that is the idea of listening in the dips. Although speech-shaped noise has few temporal or spectral dips, it does have some. Later-acquired, less-familiar words may require the listener to recover more “bits” of signal information from across the spectrum and temporal structure than earlier-acquired, more-familiar words. And listeners’ abilities to recover those signal bits and integrate them might help explain individual differences in speech-in-noise recognition. RESP: An association between age of acquisition and the minimum cues required for recognition does seem like a reasonable interpretation of our results. While dips in nominally steady noise could theoretically be associated with performance, variability in target level would likely be the dominant factor in variable audibility over time. Additional discussion of the association between age of acquisition and the number/quality of cues required for recognition has been added to the text. My final comment was that I found the Table 1 difficult to understand. It needs to be revised so that readers can immediately recognize what each row represents. RESP: Each row contains information about a parameter or the interaction between parameters, with factor levels indicated in parentheses. This has been indicated in the legend. The same revisions were made to Table 2. Submitted filename: Response to Reviewers 02.09.2022.docx Click here for additional data file. 14 Feb 2022 Effects of Word Familiarity and Receptive Vocabulary Size on Speech-in-Noise Recognition Among Young Adults with Normal Hearing PONE-D-21-35370R1 Dear Dr. Leibold, Thank you for submitting your revised manuscript to PLOS ONE, and for your careful attention to the comments of the reviewer. At this time we are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Susan Nittrouer, Ph.D. Academic Editor PLOS ONE 2 Mar 2022 PONE-D-21-35370R1 Effects of Word Familiarity and Receptive Vocabulary Size on Speech-in-Noise Recognition Among Young Adults with Normal Hearing Dear Dr. Leibold: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Susan Nittrouer Academic Editor PLOS ONE

32 in total

1. Recognition of lexically controlled words and sentences by children with normal hearing and children with cochlear implants.

Authors: Laurie S Eisenberg; Amy Schaefer Martinez; Suzanne R Holowecky; Stephanie Pogorelsky
Journal: Ear Hear Date: 2002-10 Impact factor: 3.570

2. Integration of multiple speech segmentation cues: a hierarchical framework.

Authors: Sven L Mattys; Laurence White; James F Melhorn
Journal: J Exp Psychol Gen Date: 2005-11

3. Relationships among speech perception, production, language, hearing loss, and age in children with impaired hearing.

Authors: P J Blamey; J Z Sarant; L E Paatsch; J G Barry; C P Bow; R J Wales; M Wright; C Psarros; K Rattigan; R Tooher
Journal: J Speech Lang Hear Res Date: 2001-04 Impact factor: 2.297

4. Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support.

Authors: Paul A Harris; Robert Taylor; Robert Thielke; Jonathon Payne; Nathaniel Gonzalez; Jose G Conde
Journal: J Biomed Inform Date: 2008-09-30 Impact factor: 6.317

5. The intelligibility of speech as a function of the context of the test materials.

Authors: G A MILLER; G A HEISE; W LICHTEN
Journal: J Exp Psychol Date: 1951-05

6. Individual differences in top-down restoration of interrupted speech: links to linguistic and cognitive abilities.

Authors: Michel Ruben Benard; Jorien Susanne Mensink; Deniz Başkent
Journal: J Acoust Soc Am Date: 2014-02 Impact factor: 1.840

7. Improving speech-in-noise recognition for children with hearing loss: potential effects of language abilities, binaural summation, and head shadow.

Authors: Susan Nittrouer; Amanda Caldwell-Tarr; Eric Tarr; Joanna H Lowenstein; Caitlin Rice; Aaron C Moberly
Journal: Int J Audiol Date: 2013-08 Impact factor: 2.117