Literature DB >> 24906905

Variations in the slope of the psychometric functions for speech intelligibility: a systematic survey.

Alexandra MacPherson1, Michael A Akeroyd2.   

Abstract

Although many studies have looked at the effects of different listening conditions on the intelligibility of speech, their analyses have often concentrated on changes to a single value on the psychometric function, namely, the threshold. Far less commonly has the slope of the psychometric function, that is, the rate at which intelligibility changes with level, been considered. The slope of the function is crucial because it is the slope, rather than the threshold, that determines the improvement in intelligibility caused by any given improvement in signal-to-noise ratio by, for instance, a hearing aid. The aim of the current study was to systematically survey and reanalyze the psychometric function data available in the literature in an attempt to quantify the range of slope changes across studies and to identify listening conditions that affect the slope of the psychometric function. The data for 885 individual psychometric functions, taken from 139 different studies, were fitted with a common logistic equation from which the slope was calculated. Large variations in slope across studies were found, with slope values ranging from as shallow as 1% per dB to as steep as 44% per dB (median = 6.6% per dB), suggesting that the perceptual benefit offered by an improvement in signal-to-noise ratio depends greatly on listening environment. The type and number of maskers used were found to be major factors on the value of the slope of the psychometric function while other minor effects of target predictability, target corpus, and target/masker similarity were also found.
© The Author(s) 2014.

Entities:  

Keywords:  perceptual benefit; psychometric functions; speech-in-noise understanding

Mesh:

Year:  2014        PMID: 24906905      PMCID: PMC4227668          DOI: 10.1177/2331216514537722

Source DB:  PubMed          Journal:  Trends Hear        ISSN: 2331-2165            Impact factor:   3.293


Introduction

A psychometric function describes the relationship between an observer's performance on a psychophysical task and some physical aspects of the stimuli. In particular, the psychometric function for speech intelligibility in noise describes a listener's ability to identify speech as a function of its intensity. Often, the psychometric function is summarized by two key parameters: the threshold, being the stimulus level required to give a particular level of performance (e.g., 50% correct), and the slope, being the maximum rate at which performance increases with changes in the stimulus level. Many studies have demonstrated that thresholds for speech intelligibility in noise depend greatly on various aspects of the target speech (e.g., French & Steinberg, 1947), the interfering sound (e.g., Carhart, Tillman, & Greetis, 1969; Festen & Plomp, 1990; French & Steinberg, 1947; Miller, 1947), and the listener (e.g., Duquesnoy, 1983; Festen & Plomp, 1990; Peters, Moore, & Baer, 1998). The situations, however, that result in changes to the rate at which intelligibility improves with an increase in the level of speech have been much less extensively studied. The slope is crucial, as it—not the threshold—determines the increase in perceptual benefit a listener is likely to gain from small changes in the signal-to-noise ratio (SNR), such as may be offered by a directional microphone on a hearing aid. A steep psychometric function indicates that a small increase in SNR would lead to a large increase in intelligibility; conversely, if the slope is relatively shallow, the same SNR improvement would lead to a smaller perceptual improvement. We demonstrate here how much the slope of the psychometric function varies across experiments. There is a wealth of psychometric function data available in the literature on speech identification, as many studies have looked at the factors that can affect the intelligibility of speech. Most of the published analyses of these data, however, have focused on changes in threshold, with slope changes far less commonly calculated and reported. No systematic corpus of these data is available, despite its obvious importance for isolating and identifying the factors associated with changes in slopes. We therefore carried out a systematic survey of the literature on psychometric functions for speech intelligibility, reanalyzing the data using a standard method to enable a direct comparison of slope data across different studies. Our aims were to (a) quantify how much the slope of the psychometric function varies across experimental designs and listening conditions, (b) identify listening conditions that affect the slope of the psychometric function, and (c) discuss how these trends in slope conform with previously proposed explanations for variations in the slope of the psychometric function for speech intelligibility.

Methods

A computerized literature search was undertaken to find studies that had measured the intelligibility of speech as a function of SNR. The first reports of common speech tests and the studies citing these speech tests were reviewed, as many of these studies include psychometric functions in different noise conditions. A search was also carried out for articles citing either Egan, Carterette, and Thwing (1954) or Brungart (2001a)—these two studies were singled out as they reported unusually shaped psychometric functions of masked speech. The reference list of Brungart's article was also reviewed for possible studies to include in the survey. Other miscellaneous studies containing psychometric functions that were found over the course of approximately three years, up to a cutoff date of February 2012,[1] were also included. The inclusion criteria were that studies needed to report at least one psychometric function for speech identification that was (a) measured as a function of SNR or some other unit of relative presentation level from which SNR could be calculated, (b) measured over at least three points, (c) presented clearly in graphical or tabular form, and (d) averaged over several listeners. Individual data were excluded because we found that these data tended to be harder to accurately measure (e.g., multiple overlaying psychometric functions). Although interlistener variability in slope would undoubtedly provide additional insight into the factors affecting slope, such an analysis of the data was outside the scope of the current study, which aims to identify broad trends in slope across different listening conditions. Micheyl, Xiao, and Oxenham (2012) provide an example of a detailed reanalysis of psychometric data that does explicitly take into account individual variability. A total of 146 relevant studies were found, giving 1,133 individual psychometric functions for further analysis. The individual data points for each psychometric function were recorded. These values were either taken directly from the article if the psychometric functions were reported in tabular form or extracted using a custom-written MATLAB program if the psychometric functions were displayed graphically. These data points were then fitted with a logistic function: where x is the SNR (decibels), P is the percentage of correctly identified items, and m and c are constants: c being the SNR at which P = 50% correct, and m is the slope of the function at x = c. The slope (in % per dB) of the function is equal to −25m. The best fitting values of m and c were found using the solver function of Microsoft Excel (Microsoft, 2011), which uses a nonlinear least squares method. A logistic function was selected, as it has been suggested to be a reasonable sigmoidal model for psychometric data (Wichmann & Hill, 2001) and has been commonly used to describe psychometric functions for speech intelligibility (e.g., Festen & Plomp, 1990; Pichora-Fuller, Schneider, & Daneman, 1995; Rhebergen & Versfeld, 2005; Wightman, Callahan, Lutfi, Kistler, & Oh, 2003). For consistency across all studies, none of the logistic fits was corrected for either chance or maximum performance. The information required for these corrections was not always available, and it was considered preferable to follow a standard procedure for all cases rather than correcting only a subset of the data. It is possible that this lack of correction for chance and ceiling effects could have affected slope estimates (Dai & Micheyl, 2011). Cases for which the standardized psychometric function was an extremely poor fit were excluded, however, to limit the effects of such errors in slope estimates (see Overview section). The values of slope and c were added to the database with coding information on the experimental design (see later). In 219 psychometric functions, all data points were either below P = 50% or above P = 50%. In these cases, m, which is defined at P = 50%, is an extrapolation of the data. As such, slopes calculated in this way are unlikely to be good representations of the true slope of the data, and so were excluded from further analysis. Each psychometric function in the survey was subjected to detailed coding of the experimental design for (1) target speech corpus (see later), (2) masker type (subcategories of speech, modulated noise,[2] or static noise), (3) number of maskers, (4) presentation of stimuli (subcategories of monaural, diotic, or dichotic), (5) spatial locations of target and masker, (6) target language, (7) target predictability (subcategories of high predictability from context or low predictability from context), (8) whether the target was primed before presentation, (9) any signal processing of target or masker (subcategories of vocoded, filtered, or added reverberation). If the masker was competing speech, then further coding was carried out: (1) masker language, (2) masker corpus, (3) gender of the masker talker relative to the target talker (subcategories of same gender, different gender, or same talker), (4) masker intelligibility (subcategories of intelligible or unintelligible), (5) masker uncertainty (subcategories of masker talker fixed from trial to trial or masker content fixed from trial to trial), (6) pitch shift between target and masker voices (subcategories of small if less than 3 semitones, medium if 4–7 semitones, or large if greater than 8 semitones). Finally, general information about the studies' participants was also coded: (1) age-group (subcategories of children, young adult, or older adult) and (2) hearing loss (subcategories of normal hearing, a reported hearing loss, or cochlear implant user). The target and masker speech were coded by the type of speech corpora used (e.g., BKB, IEEE, CRM, and SPIN).[3] If this information was not available, or if the speech corpus was uncommon, the speech corpus was coded under the categories of valid sentences, invalid sentences, words, digits, continuous speech, or short tokens. Valid sentences described any stimuli consisting of syntactically and semantically correct sentences (e.g., sentences read from a history text book); invalid sentences described any stimuli consisting of either syntactically incorrect sentences (“cat on sat the mat”) or semantically incorrect sentences (“the thorn can wake the kettle”); continuous speech described any speech stimuli longer than a single sentence; and short tokens described smaller speech units such as syllables and phonemes.

Results

Overview

To measure how well the logistic equation fitted the data, a root mean square (RMS) error value of the curve from the data points was calculated. On the whole, the fits were regarded as good as the RMS was small (mean RMS = 3.2%). However, 29 psychometric functions had RMS values of 10% or greater and so were excluded from the survey at this stage (they are further discussed in the Nonmonotonic Psychometric Functions section). Figure 1 shows example data from the survey and illustrates some good, as well as some poor, fits of the logistic functions to the data.
Figure 1.

Example psychometric functions from the survey illustrating examples of good, average, below average, and poor fits of the standard logistic function (solid line) to the data (open circles). The RMS value gives an indication of the fit, with cases where the RMS value was above 10% being excluded from the survey. Cases that gave good fits include those for SSI sentences in a one-talker masker (Dirks & Wilson, 1969a), SPIN sentences in a six-talker babble (Elliott, 1979), and digits in a speech spectrum static noise (HearCom, 2009). Cases that had average fits (i.e., RMS values close to the mean for the survey) include those for SPIN sentences in a six-talker babble (Dirks, Bell, & Rossman, 1986), CRM sentences in an amplitude-modulated noise (Arbogast, Mason, & Kidd, 2002), and IEEE sentences in a Gaussian noise (Bernstein & Grant, 2009). Example cases that had below-average fits include those for CRM sentences in a two-talker masker (Wightman & Kistler, 2005), digits in a six-talker babble (Wilson et al., 2006), and invalid short tokens in a one-talker masker (Danhauer, Doyle, & Lucks, 1986). Examples of poor fits include valid sentences presented in a one-talker masker (Dirks & Bower, 1969) and CRM sentences in a one-talker masker (Brungart, 2001a).

Example psychometric functions from the survey illustrating examples of good, average, below average, and poor fits of the standard logistic function (solid line) to the data (open circles). The RMS value gives an indication of the fit, with cases where the RMS value was above 10% being excluded from the survey. Cases that gave good fits include those for SSI sentences in a one-talker masker (Dirks & Wilson, 1969a), SPIN sentences in a six-talker babble (Elliott, 1979), and digits in a speech spectrum static noise (HearCom, 2009). Cases that had average fits (i.e., RMS values close to the mean for the survey) include those for SPIN sentences in a six-talker babble (Dirks, Bell, & Rossman, 1986), CRM sentences in an amplitude-modulated noise (Arbogast, Mason, & Kidd, 2002), and IEEE sentences in a Gaussian noise (Bernstein & Grant, 2009). Example cases that had below-average fits include those for CRM sentences in a two-talker masker (Wightman & Kistler, 2005), digits in a six-talker babble (Wilson et al., 2006), and invalid short tokens in a one-talker masker (Danhauer, Doyle, & Lucks, 1986). Examples of poor fits include valid sentences presented in a one-talker masker (Dirks & Bower, 1969) and CRM sentences in a one-talker masker (Brungart, 2001a). After these removals, and those of cases whose slope values were based on extrapolation, 885 psychometric functions remained in the survey, taken from 139 different studies. Table 1 summarizes the stimuli and participant information for each study (all studies are listed in the references). Full details on all coded factors for each study (including those excluded) can be found in the supplementary material.
Table 1.

Key Details of All the Studies Included in the Systematic Survey.

StudyNTarget corpusMasker typeNo. maskersMasker corpusPresentationAgeHearing
Acton (1970)2PB word listStatic noiseFree fieldYNH
Arbogast et al. (2002)24CRMSpeech and modulated noise1CRMFree fieldYNH
Barker and Cooke (2007)1Valid sentenceModulated noiseDioticYNH
Beattie (1989)1W-22Static noiseMonauralHI
Beattie, Barr, and Roup (1997)1W-22Speech20Valid sentencesMonauralYNH
Beattie and Clark (1982)4SSISpeech4Valid sentencesMonauralYNH
Bernstein and Grant (2009)12IEEESpeech, modulated, and static1HINTMonauralY and ONH and HI
Best, Gallun, Mason, Kidd, and Shinn-Cunningham (2010)5CRMSpeech and static noise1CRMDichoticYNH and HI
Bhattacharya and Zeng (2007)17Short tokens and HINTStatic noise1DioticYNH and CI
Blue-Terry and Letowski (2011)1Modified Rhyme TestStatic noiseDioticYNH
Boothroyd (2008)4Short tokensStatic noiseYNH
Boothroyd and Nittrouer (1988)5Short tokens and invalid sentencesStatic noiseDioticYNH
Bosman and Smoorenburg (1995)17Short tokens, valid, and invalid sentencesStatic noiseMonauralY and ONH and HI
Bronkhorst, Bosman, and Smoorenburg (1993)4Short tokensStatic noiseMonauralYNH
Brungart (2001a)2CRMStatic noiseDioticYNH
Brungart (2001b)1CRMSpeech, modulated, and static1CRMDioticYNH
Brungart, Simpson, Ericson, and Scott (2001)16CRMSpeech and modulated noise1, 2, or 3CRMDioticYNH
Brungart, Darwin, Arbogast, and Kidd (2005)1CRMSpeech1CRMDichoticYNH
Brungart, Chang, Simpson, and Wang (2006)6CRMModulated and static noiseDioticYNH
Brungart, Chang, Simpson, and Wang (2009)15CRMSpeech1, 2, or 3CRMDioticYNH
Brungart, Iyer, and Simpson (2006)8CRMSpeech1CRMDioticYNH
Brungart and Simpson (2002)5CRMSpeech, modulated, and static1 or 2CRMMonaural/dichoticYNH
Brungart and Simpson (2004)4CRMSpeech1 or 2CRMMonaural/dichoticYNH
Brungart and Simpson (2007)14CRMSpeech1, 2, or 3CRMMonaural/dichoticYNH
Brungart, Simpson, and Freyman (2005)3CRMSpeech and static noise1CRMFree fieldYNH
Cienkowski and Speaks (2000)1Short tokensStatic noiseMonauralY and ONH and HI
Cooper and Cutts (1971)2NU-6Static noiseMonauralYNH
Craig (1988)9SPINSpeech6SPINMonauralYNH
Crandell (1993)1BKBSpeech6SPINDioticCHI
Danhauer et al. (1986)3Invalid sentencesSpeech, modulated, and static9Valid sentencesDioticYNH
Danhauer and Leppler (1979)3Short tokenSpeech and modulated noise4 or 9Valid sentencesDioticYNH
Darwin, Brungart, and Simpson (2003)9CRMSpeech1CRMDioticYNH
Dirks, Bell, Rossman, and Kincaid (1986)3SPINSpeech and static noise6Valid sentencesMonauralYNH
Dirks and Bower (1969)37Valid sentencesSpeech and modulated noise1Valid sentencesMonauralYNH
Dirks, Morgan, and Dubno (1982)8NU-6Speech6Valid sentencesMonauralYNH
Dirks and Wilson (1969a)18Short token and SSISpeech and static noise1ContinuousDioticY and ONH and HI
StudyNTarget corpusMasker typeNo. maskersMasker corpusPresentationAgeHearing
Dirks and Wilson (1969b)20PB word listStatic noise1Short tokens/PBFree field and dioticYNH
Dirks, Wilson and Bower (1969)42NU-6Modulated and static noiseMonauralYNH
Drullman (1995)2Valid sentencesStatic noiseMonauralYNH
Drullman and Bronkhorst (2004)7Valid sentencesSpeech1Valid sentencesDioticYNH
Dubno, Horwitz, and Ahlstrom (2005)2NU-6Static noiseMonauralYNH
Egan (1948)1Short tokensStatic noiseMonaural
Egan, Carterette, and Thwing (1954)1Valid sentencesSpeech1Valid sentencesMonauralYNH
Eisenberg, Dirks and Bell (1995)4SPINModulated and static noiseMonauralYNH
Elliott (1979)6SPINSpeech6Valid sentencesDioticCNH
Erber (1971)3Short tokensStatic noiseDioticYNH
Ezzatain, Li, Pichora-Fuller, and Schneider (2010)28Invalid sentencesSpeech and static noise1 or 2Invalid sentencesFree fieldY and ONH and HI
Feeney and Franks (1982)4Short tokens, PB List, and Modified RhymeStatic noiseMonauralYNH
Festen and Plomp (1990)6Valid sentencesModulated noiseMonauralYNH
Foster and Haggard (1987)1FAAFStatic noiseMonauralYNH
Freyman, Balakrishnan, and Helfer (2001)16Invalid sentencesSpeech and modulated noise1 or 2Invalid sentencesFree fieldYNH
Freyman, Balakrishnan, and Helfer (2004)16Invalid sentencesSpeech and static noise1, 3, 5, or 9Invalid sentencesFree fieldYNH
Freyman, Helfer, and Balakrishnan (2007)6Invalid sentencesSpeech1Invalid sentencesFree fieldYNH
Freyman, Helfer, McCall, and Clifton (1999)9Invalid sentencesSpeech and static noise1Invalid sentencesFree fieldYNH
Friesen, Shannon, Baskent, and Wang (2010)17Short tokens and HINTStatic noiseFree fieldONH and CI
Fu, Shannon, and Wang (1998)9Short tokensStatic noiseDioticYNH
Gelfand (1998)2Short tokensStatic noiseMonauralYNH
Grant and Braida (1991)7IEEEStatic noiseDioticYNH
Griffiths (1967)4Modified Rhyme testStatic noiseMonauralYNH
Hagerman (1982)1Hagerman sentencesModulated noiseMonauralYNH
Hallgren, Larsby, and Arlinger (2006)1HINTStatic noiseFree fieldYNH
HearCom (2009)13Digits and MatrixStatic noiseMonaural
Helfer and Freyman (2005)8Invalid sentencesSpeech and static noise2Invalid sentencesFree fieldYNH
Helfer and Freyman (2008)4Invalid sentencesSpeech and modulated noise2Invalid sentencesFree fieldYNH
Helfer and Freyman (2009)12TMV sentencesSpeech1 or 2TMV sentencesFree fieldYNH
Hirsh, Reynolds, and Joseph (1954)2Words and invalid sentencesStatic noiseMonauralYNH
Horii, House, and Hughes (1971)2Vowels and consonantsModulated and static noiseDioticYNH
House, Williams, Hecker, and Kryter (1965)1WordsStatic noiseMonauralYNH
Howard-Jones and Rosen (1993)5WordsModulated and static noiseDioticYNH
Ihlefeld and Shinn-Cunningham (2008)3CRMSpeech1CRMDichoticYNH
Jerger and Jordan (1992)2Continuous speechSpeech1ContinuousFree fieldYNH
Jerger, Jerger, and Lewis (1981)2PSI TestSpeech1PSI TestFree fieldYNH
Johnstone and Litovsky (2006)12SpondeesSpeech and modulated noise1IEEEFree fieldC and YNH
Kalikow et al. (1977)4SPINSpeech12Valid SentencesDioticY and ONH
Kates and Arehart (2005)1HINTModulated noiseMonauralYNH
StudyNTarget corpusMasker typeNo. maskersMasker corpusPresentationAgeHearing
Keith and Talis (1970)3W-22Static noiseMonauralY and ONH and HI
Kidd, Mason, and Gallun (2005)3CRMSpeech and static noise1 or 2CRMMonauralYNH
Krull, Choi, Kirk, Prusick, and French (2010)3WordsStatic noiseDioticCNH
Kryter (1962)7Syllables, PB, MRT and valid sentenceStatic noiseMonauralYNH
Kryter and Whitman (1965)1PB words and MRTStatic noiseMonauralYNH
Lewis, Benignus, Muller, Malott, and Barton (1988)3SPINStatic noiseMonauralYNH
Li, Daneman, Qi, and Schneider (2004)18Invalid sentencesSpeech and static noise2Invalid sentencesFree fieldY and ONH
Li and Loizou (2009)7IEEESpeech and static noise2IEEEDioticYNH
MacLeod and Summerfield (1990)1ASLStatic noiseMonauralYNH
Martin and Mussell (1979)2Words and SSISpeech and static noise2Valid sentencesDichoticY?
McArdle, Wilson, and Burks (2005)5NU-6, digits, and IEEESpeech6Valid sentencesDioticY and ONH and HI
Miller, Heise, and Litcten (1951)13Syllables, words, digits, and valid sentencesStatic noiseMonaural??
Ng, Meston, Scollie, and Seewald (2011)6BKBSpeech6Valid sentencesFree fieldCNH and HI
Neiderjohn and Grotelueschen (1976)4PB ListsStatic noiseMonauralYNH
Nielsen and Dau (2009)2HINTStatic noiseDioticYNH
Oxenham and Simonson (2009)12HINTSpeech and modulated noise1IEEEDioticYNH
Ozimek, Kutzner, Sek, and Wicher (2009)1Valid sentencesSpeech6Valid sentencesDioticYNH
Ozimek, Warzybok, and Kutzner (2010)1MatrixSpeech6MatrixDioticYNH
Pederson and Studebaker (1972)2WordsStatic noiseMonauralYNH
Pichora-Fuller et al. (1995)6SPINSpeech8SPINMonauralY and ONH and HI
Pichora-Fuller, Schneider, MacDonald, Pass, and Brown (2007)4SPINSpeech8SPINMonauralYNH
Plomp and Mimpen (1979)1Valid sentencesStatic noiseDioticYNH
Rakerd, Aaronson, and Hartmann (2006)3CRMSpeech2 or 3CRMFree fieldYNH
Rao and Letowski (2006)5CAT testSpeech and static noise6CAT testDioticYNH
Rogers, Lister, Febo, Besing, and Abrams (2006)3W-22Static noiseDioticYNH
Schultz and Schubert (1969)2W-22 and MCDTStatic noiseMonaural??
Scott, Rosen, Wickham, and Wise(2004)1BKBStatic noiseDioticYNH
Sergeant, Atkinson, and Lacroix (1979)3TTI, MRT, and CIDStatic noiseMonauralYNH
Sherbecoe and Studebaker (2002)2CSTStatic noiseMonauralYNH
Speaks and Karmen (1967)1Valid sentencesStatic noiseMonauralYNH
Speaks, Karmen, and Benitez (1967)4SSISpeech1SSIMonauralYNH
Speaks, Parker, Kuhl, and Harris (1972)3Continuous speechStatic noiseMonauralYNH
Stickney, Zeng, Litovsky, and Assmann (2004)10IEEESpeech and static noise1IEEEMonauralYNH and CI
Studebaker, Taylor, and Sherbecoe (1994)4WordsStatic noiseMonauralYNH
Surprenant (2007)1SyllablesStatic noiseDioticY and ONH
Surr and Schwartz (1980)1CCTSpeech12Valid sentencesDioticYHI
Suter (1985)4MRT and CIDSpeech12Valid sentencesMonaural?HI
Tabri, Smith Abou Chacra, and Pring (2011)8SPINSpeech6Valid sentencesDioticYNH
StudyNTarget corpusMasker typeNo. maskersMasker corpusPresentationAgeHearing
Takahashi and Bacon (1992)8SPINModulated and static noiseMonauralY and ONH and HI
Theodoridis and Schoeny (1988)3W-22Static noiseMonauralYNH
Theodoridis, Schoeny, and Anné (1985)2W-22Static noiseMonauralYNH
Thomas and Ravindran (1974)3PB wordsStatic noiseMonaural??
Trammell and Speaks (1970)2Valid sentencesSpeech1Valid sentencesMonauralYNH
Tun (1998)6Valid sentencesSpeech20Valid sentencesDioticY and ONH
Van Wieringen and Wouters (2008)4Digits and LISTStatic noiseDioticYNH
Vestergaard, Fyson, and Patterson (2009)3SyllablesSpeech and static noise1SyllablesMonauralYNH
Wagener, Josvassen and Ardenkjoer (2003)1Dantale 2Static noiseDioticYNH
Whitmal, Poissant, Freyman, and Helfer (2007)6Syllables and valid sentencesSpeech and static noise1 or 2Invalid sentencesDioticYNH
Wightman and Kistler (2005)28CRMSpeech and static noise1 or 2Invalid sentencesMonauralYNH
Williams and Hecker (1968)1HINTStatic noiseMonauralYNH
Wilson and Antablin (1980)2PIT and NU-6Static noiseMonauralYNH
Wilson and Burks (2005)3WordsSpeech6WordsMonauralOHI
Wilson, Burks, and Weakley (2006)12NU-6 and digitsSpeech6Valid sentencesDioticOHI
Wilson, Carnell, and Cleghorn (2007)4WordsSpeech and static noise6Valid sentencesMonauralY and ONH and HI
Wilson and Cates (2008)2WordsSpeech6Valid sentencesDioticY and ONH and HI
Wilson, Farmer, Gandhi, Shelburne, and Weaver (2010)8WordsStatic noiseMonauralC and YNH
Wilson and McArdle (2007)4WordsSpeech6Valid sentencesDioticOHI
Wilson et al. (2010)13WordsSpeech, modulated, and static6Valid sentencesMonauralY and ONH and HI
Wilson, McArdle, and Roberts (2008)14PB, W-22, NU-6, and digitsStatic noiseMonauralYNH
Wilson, McArdle, and Smith (2007)6Words, IEEE, and BKBSpeech6Valid sentencesMonauralY and ONH and HI
Wilson and Oyler (1997)2W-22 and NU-6Static noiseDioticYNH
Wilson and Strouse (2002)8NU-6Speech1Valid sentencesDioticY and ONH and HI
Wu et al. (2005)6Invalid sentencesSpeech and static noise1 or 2Invalid sentencesFree fieldYNH
Yang et al. (2007)18Syllables and wordsSpeech and static noise1 or 2Invalid sentencesFree fieldYNH
Young, Goodman, and Carhart (1979)3WordsSpeech5Valid sentencesMonauralYNH

Note. C = children; Y = young adults; O = older adults; NH = normal hearing; HI = hearing impaired; CI = cochlear implant user. For speech corpus codes, see note 3.

Key Details of All the Studies Included in the Systematic Survey. Note. C = children; Y = young adults; O = older adults; NH = normal hearing; HI = hearing impaired; CI = cochlear implant user. For speech corpus codes, see note 3. It was found that a log-normal distribution (Buzsáki & Mizuseki, 2014; Johnson & Kotz, 1970) gave an excellent fit to the overall frequency distribution of slope values: where f is frequency and s is slope. The best fitting values of θ and σ were found using Excel's Solver, which gave values of 1.46 and 0.63, respectively. Figure 2 shows the overall distribution of slope values and this best-fitting log-normal curve. It can be seen that there is a very wide variation in the slope. The minimum and maximum values of slope were 0.4% per dB and 43.8% per dB, the mean was 7.5% per dB, and the median was 6.6% per dB. There was a clear positive skew, with the bulk of values, including the median, lying to the left side of the mean.
Figure 2.

The overall distribution of slope values measured in the systematic slope survey, across all 885 cases (see Equation 2). The solid line is a log-normal distribution fitted to the data. The median for the distribution is indicated by an arrow.

The overall distribution of slope values measured in the systematic slope survey, across all 885 cases (see Equation 2). The solid line is a log-normal distribution fitted to the data. The median for the distribution is indicated by an arrow.

Major Trends

With 885 cases, it is not too surprising to find substantial variations across details of stimuli, maskers, and other aspects of experimental design. The analysis here therefore concentrates on broad categories rather than on specific individual combinations. The full data set is available in the supplementary material.

Type of masking noise

The first major trend in the slope survey data is that speech maskers give shallower psychometric functions than either amplitude-modulated noise maskers or static noise maskers. Table 2 shows the median slopes and interquartile ranges of psychometric functions measured for the six general classes of speech stimuli and the seven most commonly reported speech corpora when different types and numbers of maskers were used. Table 3 shows the number of studies and the number of individual psychometric functions that these values are based on.[4] It can be noted from Table 2 that of the 11 different target speech types for which slopes have been measured in both a speech masker and a noise masker (be it either modulated noise or static noise), eight of them gave smaller median slope values for psychometric functions measured in a speech masker than they did in a noise masker (namely, Words, Valid sentences, Invalid sentences, Continuous speech, CRM, HINT, SSI, and Other).
Table 2.

Median Slope Values for Each of the Primary Target/Masker Combinations Identified in the Survey.

MaskerShort tokensWordsDigitsValid sentencesInvalid sentencesContinuous speechCRMHINTIEEENU-6PB listsSPINSSIOther
1 Speech masker5.1 (3.3)6.7 (2.2)2.5 (1.2)6.5 (4.2)5.7 (–)3.7 (1.5)3.4 (2.0)4.5 (1.3)8.7 (–)4.6 (3.2)4.2 (1.6)
2 Speech maskers7.9 (2.2)7.7 (–)7.7 (5.5)6.3 (1.4)4.2 (3.5)4.3 (–)9.8 (3.4)
3 Speech maskers15.1 (–)9.2 (2.4)
4 + Speech maskers2.3 (–)7.5 (2.1)9.9 (3.6)9.5 (0.8)9.1 (2.5)15.9 (–)6.2 (3.3)8.2 (7.2)13.2 (4.5)8.4 (3.5)
1 Modulated noise masker3.4 (2.3)8.3 (5.8)13.5 (–)7.3 (0.9)5.8 (2.2)4.6 (–)4.3 (1.1)5.8 (2.7)3.1 (5.5)14.7 (7.4)5.2 (–)
2 Modulated maskers5. 0 (–)8.1 (–)
3 Modulated maskers10.2 (–)
1 Static noise masker4.0 (4.4)8 2 (6.0)13.4 (6.1)12.3 (6.5)8.1 (2.1)7.2 (–)10.1 (3.5)9.1 (5.5)4.8 (3.8)5.2 (2.6)6.1 (3.4)4.7 (7.1)17.1 (4.2)5.9 (7.6)
2 Static noise maskers8.8 (1.0)
Mixed15.2 (–)2.7 (–)1.9 (–)4.5 (1.8)13.6 (–)6.0 (–)

Note. Interquartile ranges for each condition are given in parentheses.

Table 3.

Number of Studies Reporting Data for Each of the Target/Masker Combinations in Table 2.

MaskerShort tokensWordsDigitsValid sentencesInvalid sentencesContinuous speechCRMHINTIEEENU-6PB listsSPINSSIOther
1 Speech masker2/101/45/463/131/28/411/62/101/12/101/2
2 Speech maskers1/61/32/95/387/421/31/10
3 Speech maskers1/24/20
4 + Speech maskers2/36/132/132/71/42/34/197/311/49/26
1 Modulated noise masker2/53/232/31/44/211/31/41/112/61/123/3
2 Modulated maskers1/11/1
3 Modulated maskers1/1
1 Static noise masker13/5513/634/1914/326/341/35/96/194/196/98/274/92/827/51
2 Static noise maskers1/4
Mixed1/11/31/23/151/11/3

Note. Number of individual cases is given in bold.

Median Slope Values for Each of the Primary Target/Masker Combinations Identified in the Survey. Note. Interquartile ranges for each condition are given in parentheses. Number of Studies Reporting Data for Each of the Target/Masker Combinations in Table 2. Note. Number of individual cases is given in bold. Figure 3 shows the overall distributions of slope values found for three different masker types: speech, modulated noise, and static noise.[5] In an attempt to disentangle the effect of the type of masker used from the slope effect seen when the number of maskers was increased (see Number of Masking Noises section), only cases where a single masker was used were included in this figure. There is a substantial difference between the three distributions: the measures of central tendency (i.e., median and mean slope values) decreased in value from static noise maskers (median = 7.7% per dB) through modulated noise (median = 6.1% per dB) to speech maskers (median = 3.7% per dB). This last median was considerably shallower than that of the overall median slope reported earlier (median = 6.6% per dB), suggesting that the shallowest end of the distribution was more densely populated by cases that used speech maskers.
Figure 3.

The distributions of slope values for three different categories of masker: speech, amplitude-modulated noise, and static noise. The dotted lines indicate the overall median slope value for the survey, while the arrows indicate the median slope value for each specific distribution. Only cases where one masker was used are included.

The distributions of slope values for three different categories of masker: speech, amplitude-modulated noise, and static noise. The dotted lines indicate the overall median slope value for the survey, while the arrows indicate the median slope value for each specific distribution. Only cases where one masker was used are included.

Number of masking noises

The second major trend is that the slope of the psychometric function tends to increase as the number of maskers increases, at least up to approximately three or four maskers. Table 2 shows that increasing the number of speech maskers from one to two increases the slope by, on average, 4% per dB, which begins to approach the values produced by either a modulated noise or static noise masker. Figure 4 shows the distribution of slope values as a function of the number of maskers used. To avoid a confound of the effect of masker type on slope, only psychometric functions measured using speech maskers were included.[6] It can be seen that the distributions were shifted to the right and to larger values as the number of maskers was increased from one to two, to three or more. Only in the one-masker condition was the median slope value (median = 3.7% per dB) below that of the overall median slope value shown in Figure 2. The distribution in the bottom panel is for cases with 5–20 speech maskers. The distribution, mean, and median slope values for this condition were very similar to those found when three or four maskers were used. This would suggest that once the number of maskers reached three or four, any additional maskers had a negligible effect on the slope.
Figure 4.

The distributions of slopes found when one, two, three or four, or greater than five maskers were used. The dotted line indicates the overall median slope value for the survey, while the arrow indicates the median slope value for each specific distribution. Only cases where speech maskers were used are included.

The distributions of slopes found when one, two, three or four, or greater than five maskers were used. The dotted line indicates the overall median slope value for the survey, while the arrow indicates the median slope value for each specific distribution. Only cases where speech maskers were used are included.

Minor Trends

Although the type and number of maskers used had a large effect on slope, these factors cannot solely account for all the slope variation seen in the survey. For example, there was a range of 16% per dB between the lowest and highest slope values for cases with one speech masker (see Figure 4, top panel). Several more minor trends in slope will now be briefly described.

Predictability of target speech

Figure 5 compares the slopes of psychometric functions for highly predictable speech targets with those for less predictable speech targets. The data came principally from experiments where the SPIN sentences (Kalikow, Stevens, & Elliot, 1977) were used as targets, as this is the main corpus in which the degree of target predictability is manipulated. The left column includes slope values for speech maskers, whereas the right column includes slope values for noise maskers.[7] For the speech maskers, a clear effect was found, with less predictable targets producing markedly shallower slopes (median = 7.1% per dB) than highly predictable targets (median = 13.8% per dB). This slope difference was reduced if the masker was noise; however, here, the low-predictability median slope was 5.4% per dB and the high-predictability median slope was 8.6% per dB. In addition to a difference in median slope values, there was also a difference in the width of the distributions of the slope values between the high and the low predictable targets: When either speech or static noise maskers were used, broader slope distributions were seen for the highly predicable targets than for the less predictable targets.
Figure 5.

The different distributions of slope values found when there was either a high or low probability of target speech being predicted from previous context. The left panels plot these distributions for speech maskers, while the right panels plot these distributions for static noise maskers. The dotted lines indicate the overall median slope value for the survey, while the arrows indicate the median slope value for each specific distribution. Only cases where one masker was used are included.

The different distributions of slope values found when there was either a high or low probability of target speech being predicted from previous context. The left panels plot these distributions for speech maskers, while the right panels plot these distributions for static noise maskers. The dotted lines indicate the overall median slope value for the survey, while the arrows indicate the median slope value for each specific distribution. Only cases where one masker was used are included.

Target corpus

Figure 6 shows the distributions of slope values for targets taken from various corpora. The slopes measured using four standard speech tests (CRM, HINT, IEEE, and SSI) are displayed separately for speech maskers (left column) and static noise maskers (right column).[8] The data show that when a speech masker is used, the choice of target corpus has little effect on slope (median slopes = 3.7%, 3.4%, 4.5%, and 4.6% per dB for CRM, HINT, IEEE, and SSI, respectively), but a large variation in slope is seen when the masker was a static noise (median slopes = 10.1%, 9.1%, 4.8%, and 17.1% per dB; IEEE gave the lowest while SSI gave the highest).
Figure 6.

The distribution of slope values found for four different speech corpora (CRM, HINT, IEEE, and SSI), when they were presented in speech maskers (left panels) and when they were presented in static noise maskers (right panels). Again the dotted lines in each panel indicate the overall median, while the arrows indicate the median for each category of target, and only cases where one masker was used are included.

The distribution of slope values found for four different speech corpora (CRM, HINT, IEEE, and SSI), when they were presented in speech maskers (left panels) and when they were presented in static noise maskers (right panels). Again the dotted lines in each panel indicate the overall median, while the arrows indicate the median for each category of target, and only cases where one masker was used are included.

Similarity of target and masker voices

Figure 7 shows the distributions of slope values for varying degrees of target/masker voice similarity. The subcategories of similarity include (unprocessed)[9] target and maskers spoken by the same talker, by a different person of the same gender, or by a person of a different gender. These subcategories include cases where only one speech masker was used. The slopes for the same talker category were shallower than those for talkers of different genders (medians of 3.4% compared with 5.0% per dB). The distribution of slopes given when the target and masker were of the same gender but spoken by different people, however, overlaps with each of the other distributions. This wider distribution may reflect the greater variation in similarity for this subcategory, that is, some same-gender voices were likely to be more similar than others.
Figure 7.

The distributions of slope values found for speech maskers with three different levels of talker similarity to the target speech: same talker, same gender talker, and different gender talker. The dotted lines indicate overall median slope, while the arrows indicate individual medians for each distribution. Only cases where one masker was used are included.

The distributions of slope values found for speech maskers with three different levels of talker similarity to the target speech: same talker, same gender talker, and different gender talker. The dotted lines indicate overall median slope, while the arrows indicate individual medians for each distribution. Only cases where one masker was used are included.

Other minor effects

Prior exposure to, or priming, some aspects of either the target or masker before a trial also affects the slope of the psychometric function. Slope values tended to be slightly steeper when either the target or masker sentence was primed compared with when no prime was presented (medians of 7.8% per dB, n = 27 compared with 5.9% per dB, n = 374 for speech maskers, and 8.9% per dB, n = 19 compared with 7.4% per dB, n = 342 for the static noise maskers). Primed cases included (a) acoustic primes, where target or masker voices were primed, (b) linguistic primes, where the content of the target or masker was primed, and (c) dual primes, where both the acoustic and content of the target or masker were primed (e.g., the prime was the start of the test sentence). The content of the masking speech also has a small effect on slope. When the content of the masker was very similar to that of the target, for example, when they were taken from the same speech corpus, slopes tended to be shallower (median = 4.6% per dB, n = 117) than when the masker content was more linguistically distinct from the target, that is, when they were taken from different speech corpora (median = 6.5% per dB, n = 281). More generally, masking speech whose content was meaningful gave shallower psychometric functions (median = 4.0% per dB, n = 203) than those with non-meaningful content (e.g., time-reversed speech, foreign language speech, invalid sentences, or babble; median = 7.3% per dB, n = 215). There was also an indication that listener age had an effect on the slope of the psychometric function. There was a trend of increasing slope with age when a speech masker was used (n = 34, 299, and 63, medians = 4.6% per dB, 5.8% per dB, and 7.1% per dB for children, young adults, and older adults, respectively). No effect of age on slope was evident, however, when a static noise masker was used (n = 10, 248, and 44, medians = 8.3% per dB, 7.4% per dB, and 7.8% per dB, for children, young adults, and older adults, respectively). The hearing ability of the listeners (normal hearing, hearing impaired, or cochlear implant user) was coded for in the survey. In cases using a speech masker, there was a trend of increasing slope with hearing impairment (medians = 6% per dB and 7.5% per dB, n = 345 and 55 for normal hearing and hearing-impaired listeners, respectively); however, a reverse trend was observed in cases using a static noise masker, with slope decreasing with increased impairment (medians = 7.8% per dB, 5.9% per dB, and 2.7% per dB, n = 312, 34, and 15 for normal hearing, hearing-impaired listeners, and cochlear implant users, respectively). The results for the effect of age and hearing impairment on slope are somewhat tentative, however, as the sample sizes of the groups were particularly unequal in both types of comparison. Further, the two effects are difficult to disentangle as in 98% of cases including young listeners, the listeners were also normal hearing, and in 70% of cases including older listeners, the listeners were also hearing impaired, thus partially confounding the effects of age and hearing impairment.

Nonmonotonic Psychometric Functions

As previously noted, any cases where the data had to be extrapolated to fit a logistic function, or cases where the logistic functions were a poor fit to the data, were excluded from the slope survey. The latter was mostly due to extremely shallow or unusual psychometric functions. These generally took two forms: functions where performance plateaued over a specific SNR range (usually −12 to 0 dB) before increasing at higher SNRs, and functions with dips where performance instead decreased over this SNR before increasing (see Figure 1, bottom panels). Twenty-three of the cases that were excluded from the survey due to high RMS values were nonmonotonic in shape (e.g., plateaus or dips). The majority of functions in this subset were from speech maskers where only one masker was used (19 of 23). While these nonmonotonic psychometric functions were measured using several different speech stimuli, the two largest contributors were from using CRM stimuli (10/23) and valid sentences (5/23). Most occurred when the same talker was used in the target and the masker (18/23), whether the target was unprocessed (9/23), processed (e.g., vocoded, 7/23), or mixed with other maskers (2/23). The listening conditions giving the shallowest slopes fit with the trends reported earlier for shallow slopes identified in the main slope survey.

Discussion

We systematically surveyed the published data on the psychometric functions for speech intelligibility to identify the main factors that affect its slope. Large variations in slope were found, with slopes ranging from as shallow as 1% per dB to as steep as 44% per dB. The median value across 139 studies (885 cases) was 6.6% per dB. The type and number of maskers used were major factors on the value of the slope of the psychometric function. Other minor effects of target predictability, target corpus, and target/masker similarity were also found. There was also an indication that age and hearing impairment might also affect slope, although it was not possible for the current survey to completely disentangle these two effects.

Slope Changes as a Consequence of Fluctuating Maskers

Our analyses have clearly demonstrated that masker type affects the slope of the psychometric function, with speech maskers found to give shallower slopes than noise maskers, be they amplitude modulated or static noise. The number of speech maskers used also affected the slope of the psychometric function, with the slope of the function increasing as the number of maskers was increased from one to about three or four. Given that speech can be thought of as the sum of multiple amplitude-modulated frequency bands (Drullman, Festen, & Plomp, 1994) and that increasing the number of maskers will alter the quality of the amplitude variations (Cooke, 2006; Miller, 1947), both of these effects indicate the importance of masker amplitude modulations on slope. The effects of amplitude modulation on slope can be understood by considering glimpsing (Figure 8). When target speech is presented in a fluctuating masker, there will be instances in which the speech sounds coincide with amplitude minima (or dips) in the masking waveform. In these dips local SNR is increased, allowing the listener to glimpse the target speech signal (Cooke, 2006; Miller & Licklider, 1950). These glimpses can greatly improve speech intelligibility and so lower speech reception thresholds, as the information they provide can help to identify even the parts of the speech that are still masked (Miller, 1947; Takahashi & Bacon, 1992; Wilson & Carhart, 1969). Thus, amplitude modulations increase the SNR range over which target speech will remain audible (Rhebergen & Versfeld, 2005), as glimpses of target speech may remain even as SNR is decreased. The result is a shallower psychometric function for modulated maskers than for static maskers (Speaks et al., 1967).
Figure 8.

A schematic illustration of the nonlinear increase in speech intelligibility that arises with amplitude-modulated maskers. Panels (a) to (c) represent a speech signal presented in a static noise. As SNR is decreased (i.e., the masker is increased), the proportion of the signal that is audible decreases, as does speech identification. Panels (e) to (g) illustrate the same speech signal presented in an amplitude-modulated noise. This time, as SNR decreases, glimpses of the target are still available, which can be used to aid in speech identification. Even at the lowest SNR in Panel (g), a large proportion of these glimpses still remain. Panel (d) shows an example psychometric function for speech (CRM sentences) in a static noise, and Panel (h) shows an example psychometric function for the same speech stimuli in an amplitude-modulated masker.

A schematic illustration of the nonlinear increase in speech intelligibility that arises with amplitude-modulated maskers. Panels (a) to (c) represent a speech signal presented in a static noise. As SNR is decreased (i.e., the masker is increased), the proportion of the signal that is audible decreases, as does speech identification. Panels (e) to (g) illustrate the same speech signal presented in an amplitude-modulated noise. This time, as SNR decreases, glimpses of the target are still available, which can be used to aid in speech identification. Even at the lowest SNR in Panel (g), a large proportion of these glimpses still remain. Panel (d) shows an example psychometric function for speech (CRM sentences) in a static noise, and Panel (h) shows an example psychometric function for the same speech stimuli in an amplitude-modulated masker. When a single competing talker is used as the masker, the temporal fluctuations are relatively slow, and there are likely to be many opportunities where the target speech will coincide with a dip in the amplitude of the masker, that is, there will be many opportunities for glimpsing the target speech (Miller & Licklider, 1950). As more maskers are added, the spectral and temporal dips begin to fill (Cooke, 2006; Miller, 1947). The chance that the target will temporally overlap with at least one of the maskers becomes greater, and overall amplitude modulations in the masking mixture effectively become shallower and briefer. The opportunities for glimpsing the target, therefore, become fewer. The reduced opportunity for glimpsing leads to an increase in slope. In the extreme case, if enough voices are added to the masking signal, then it would approach that of a speech-shaped static noise (e.g., Cooke noted that when six or more masking voices were present, intelligibility was not significantly different from that of a speech-shaped static noise masker). Our analyses demonstrate that only three or four masking voices are needed before the slopes of psychometric functions became equivalent to those given by a static noise. Curiously, we found that amplitude-modulated noises did not give substantially shallower slopes than the static noise maskers, as might be expected by this glimpsing argument (see Figure 3). This could possibly be explained by the wide range of maskers that fell into the category of modulated noise, that is, any noise masker whose amplitude was temporally varied regardless of modulation depth, frequency, or duration. Modulation depths ranged from 1 to 48 dB, and modulation rates varied from 1 to 100 interruptions per second. Not all modulated maskers will result in a flattening of the psychometric function; for instance, only fluctuations with relatively long durations (greater than 200 ms) have been found to give shallower psychometric functions than nonmodulated maskers (Howard-Jones & Rosen, 1993). It is likely that the survey did not capture the subtle effects that experimental manipulation of amplitude modulations has on slope over and above the consistent modulations exhibited by speech maskers. There was an indication from the survey that older, hearing-impaired listeners tended to give steeper psychometric functions than young normal hearing listeners when speech was presented in a competing speech masker. This finding accords with the slope pattern that would be expected if this listener group were less able to make use of brief dips in the power of background noise to help identify target speech, as has previously been suggested (e.g., Festen & Plomp, 1990). This reduced glimpsing ability for older, hearing-impaired listeners has been attributed to a reduced temporal resolution (Lutman,1991; Schneider, 1997) and, in the case of listeners with normal hearing thresholds but with deficits listening in noisy environments, to reduced fidelity when encoding suprathreshold sounds (Bharadwaj, Verhulst, Shaheen, Liberman, & Shinn-Cunningham, 2014). Reduced glimpsing would, in general terms, result in an amplitude-modulated masker acting more like a static noise masker, which would lead to a steeper psychometric function.

Slope Changes as a Consequence of Target/Masker Confusion

The slope survey identified 23 cases where the psychometric function was nonmonotonic. Most of these functions were produced when a speech masker was used, and nearly all of those functions were given when at least one of the speech maskers was spoken by the same voice as the target. These results suggest that a high degree of similarity between the target and the masker is required to give nonmonotonic psychometric functions. The survey also demonstrated that even when psychometric functions were monotonic, manipulating the acoustic similarity of the target to the speech masker affected the slope, as shallower slopes were found when the target and masker voices were spoken by the same person than when they were spoken by people of different genders. Linguistic similarity between the target and masker, that is, if they were both taken from the same speech corpus, also tended to result in shallower psychometric functions. Conversely, there was the suggestion that providing a cue that could aid in the differentiation of a target from a masker when both were speech, such as providing a prime of the target voice or content, could steepen the slope of the psychometric function. These effects combined indicate that the degree of confusion that exists between a target and a masker can be a factor in the resultant slope of the psychometric function. The role of confusion on slope can be explained by increased reliance on a level difference between target and masker signals (Brungart, 2001a; Dirks & Bower, 1969; Egan et al., 1954). Such reliance is thought to occur when difficulties arise disentangling elements of a target signal from a similar sounding masker signal. In such cases, if the target is either less intense or more intense than a masker, then the level difference can be used as a cue to distinguish which sound is which. The greater the reliance that is placed on this cue, the more dissociated intelligibility is likely to become from overall SNR. Intelligibility can in principle be better at negative SNRs, where a clear level difference exists between the two signals, than at SNRs near zero, where the level difference is smaller. Extreme confusion between a target and a masker (i.e., where both signals are spoken by the same person) can, therefore, have the effect of flattening the slope of the psychometric function or even giving a dip in the function near 0 dB (i.e., where there is no level difference cue available).

Slope Changes as Consequence of the Availability of Top-Down Information

The survey demonstrated that target stimuli that contained keywords that were predictable from their content gave steeper slopes than those whose keywords were unpredictable. It was also demonstrated that targets taken from some speech corpora gave shallower slopes than others. The speech corpora whose targets tended to give shallow slopes were commonly open-set such as the IEEE corpus (Rothauser et al., 1969). Conversely, targets taken from closed-set corpora, such as the SSI (Speaks & Jerger, 1965), gave the steepest slopes. These effects indicate that the relative contributions of perceptual and cognitive factors may influence slope. Pichora-Fuller et al. (1995) suggested that congruent previous context constrains possible word options, shifting the influence of word identification from perceptual (bottom-up) to cognitive (top-down) information. The mechanism is essentially positive feedback; with a greater dependence on top-down information, word identification can increase more rapidly with changes in level as small increases in acoustic information may be sufficient to further constrain possible speech elements. The probability of other speech elements then being guessed correctly increases, resulting in a steepening of the psychometric function (Bronkhorst et al., 1993). If, however, there is little top-down information available to constrain word options or if this information is incongruent with the rest of the utterance (as is the case when keywords are unpredictable), intelligibility will be based on bottom-up information alone and will thus increase more slowly as level is increased, giving a relatively shallow psychometric function. Several individual studies have clearly demonstrated this effect (Dirks et al., 1986; Dubno, Ahlstrom, & Horwitz, 2000; Elliott, 1979; Kalikow et al., 1977; Lewis et al., 1988; Pichora-Fuller et al., 1995). Aside from slopes being generally steeper when target speech could be predicted from its context, it was also noted in the current survey that distributions of slope values tended to be broader for such targets than for those whose content was unpredictable. It is possible that this difference in slope distributions reflects a variation in the reliance on context and top-down information by different listeners across studies. It has been suggested, for example, that older listeners can benefit more from supportive context than younger listeners can (Pichora-Fuller et al., 1995). A greater reliance on context would, as mentioned earlier, have a tendency to steepen the slope of the psychometric function while a greater reliance on perceptual information would have a tendency to flatten the slope of the psychometric function. A shift in the balance of these two strategies may, in part, be the reason that steeper slopes were seen in the current survey for older, hearing-impaired listeners than for younger, normal hearing listeners, and the greater variation in the use of context by listeners across studies may explain the broader slope distribution for predictable, compared with unpredictable, target utterances. The number of possible responses available in a speech test can also alter the relative contributions of perceptual and cognitive factors in speech identification. The SSI, for example, is usually presented as a closed-set corpus (Speaks & Jerger, 1965) in which listeners are asked to match presented sentences to a list of a possible 10 sentences. Top-down information in this case can very effectively constrain identification; only part of the sentence needs to be audible for identification to be successful. Small changes in audibility, therefore, can have large effects on intelligibility resulting in a steep slope. The IEEE corpus, on the other hand, is open set (Rothauser et al., 1969), as it consists of 720 sentences on different topics. Top-down information is far less constraining in this case. Although the context of the sentence may allow some top-down influence, speech identification will be much more heavily dependent on bottom-up information for these speech stimuli compared with the SSI, thus giving less improvement in intelligibility as SNR is increased and so a shallower psychometric function. The CRM corpus is also a closed set, offering 32 response options. The survey demonstrated that despite this, the CRM corpus tended to give relatively shallow slopes (e.g., 3.7% per dB with a single speech masker). This may be partially explained by the fact that there are no contextual or semantic cues available in CRM sentences to aid in the identification of the keywords. The CRM keywords are likely less constrained by top-down information than the SSI corpus. Also, studies that used CRM sentences as targets also commonly used CRM sentences as maskers. This increased similarity between the target and masker, as described in the Slope Changes as a Consequence of Target/Masker Confusion section, may also explain the shallower than expected psychometric functions for this particular speech corpus when presented in a speech masker.

Conclusions

The slope of the psychometric function for masked speech varies greatly (mean, 7.5% per dB; range, 0–44% per dB). Understanding the factors affecting the slope of the psychometric function and the mechanisms that underlie these slope changes is important, as it gives a means of gauging the amount of perceptual benefit that can be expected given a specific change in SNR in a specific listening condition. The survey of 885 psychometric functions has demonstrated that the type and number of speech maskers both had an effect on slope as did the choice of target corpus, its predictability, and its similarity to the masker. Three broad underlying mechanisms were outlined to explain why there is such a large variation across listening conditions, these mechanisms including slope changes as the result of amplitude modulations in the masker, confusion between the target and the masker, and the availability of top-down information. In particular, single speech maskers are likely to give particularly shallow slopes, as they contain amplitude modulations that offer extensive opportunities for glimpsing while still sharing acoustic and linguistic features that may become confused with the target speech. The current survey has highlighted that the slope of the psychometric function, and therefore the amount of perpetual benefit that can be gained from an increase in SNR, is not fixed but instead varies greatly depending on both target and masker selection. These findings would suggest that care needs to be taken in selecting both target and masker stimuli for speech research with consideration made about the likely shape of the psychometric function, as well as the likely threshold. That the slope of the psychometric function can vary so much is particularly pertinent for listeners who struggle with speech-in-noise understanding and who rely on a hearing aid to provide improvement in speech audibility. The slope for these listeners will relate directly to the amount of benefit they might expect to receive from their hearing aid. The current study was unable to ascertain the direct effects that hearing impairment and age had on the slope of the psychometric function. These effects are an important direction for future research, as an understanding of them is crucial if we wish to quantify the amount of perceptual benefit a listener is likely to gain from any change in SNR offered by a hearing aid.
  151 in total

1.  Effects of fundamental frequency and vocal-tract length changes on attention to one of two simultaneous talkers.

Authors:  Christopher J Darwin; Douglas S Brungart; Brian D Simpson
Journal:  J Acoust Soc Am       Date:  2003-11       Impact factor: 1.840

2.  Cochlear implant speech recognition with speech maskers.

Authors:  Ginger S Stickney; Fan-Gang Zeng; Ruth Litovsky; Peter Assmann
Journal:  J Acoust Soc Am       Date:  2004-08       Impact factor: 1.840

3.  Normative data for the Words-in-Noise Test for 6- to 12-year-old children.

Authors:  Richard H Wilson; Nicole M Farmer; Avni Gandhi; Emily Shelburne; Jamie Weaver
Journal:  J Speech Lang Hear Res       Date:  2010-08-10       Impact factor: 2.297

4.  Word recognition in noise at higher-than-normal levels: decreases in scores and increases in masking.

Authors:  Judy R Dubno; Amy R Horwitz; Jayne B Ahlstrom
Journal:  J Acoust Soc Am       Date:  2005-08       Impact factor: 1.840

5.  An Evaluation of the BKB-SIN, HINT, QuickSIN, and WIN Materials on Listeners With Normal Hearing and Listeners With Hearing Loss.

Authors:  Richard H Wilson; Rachel A McArdle; Sherri L Smith
Journal:  J Speech Lang Hear Res       Date:  2007-08       Impact factor: 2.297

6.  The intelligibility of speech as a function of the context of the test materials.

Authors:  G A MILLER; G A HEISE; W LICHTEN
Journal:  J Exp Psychol       Date:  1951-05

7.  The Words-in-Noise (WIN) test with multitalker babble and speech-spectrum noise maskers.

Authors:  Richard H Wilson; Crystal S Carnell; Amber L Cleghorn
Journal:  J Am Acad Audiol       Date:  2007-06       Impact factor: 1.664

8.  Development of the California Consonant Test.

Authors:  E Owens; E D Schubert
Journal:  J Speech Hear Res       Date:  1977-09

9.  Effect of masker type and age on speech intelligibility and spatial release from masking in children and adults.

Authors:  Patti M Johnstone; Ruth Y Litovsky
Journal:  J Acoust Soc Am       Date:  2006-10       Impact factor: 1.840

10.  Measuring the contribution of printed context information to acoustical word recognition by normal subjects.

Authors:  G C Theodoridis; Z G Schoeny; A Anné
Journal:  Audiology       Date:  1985
View more
  24 in total

1.  Psychometric functions for sentence recognition in sinusoidally amplitude-modulated noises.

Authors:  Yi Shen; Nicole K Manzano; Virginia M Richards
Journal:  J Acoust Soc Am       Date:  2015-12       Impact factor: 1.840

2.  Psychometric function slope for speech-in-noise and speech-in-speech: Effects of development and aging.

Authors:  Kathryn A Sobon; Nardine M Taleb; Emily Buss; John H Grose; Lauren Calandruccio
Journal:  J Acoust Soc Am       Date:  2019-04       Impact factor: 1.840

3.  Evidence for enhanced neural tracking of the speech envelope underlying age-related speech-in-noise difficulties.

Authors:  Lien Decruy; Jonas Vanthornhout; Tom Francart
Journal:  J Neurophysiol       Date:  2019-05-29       Impact factor: 2.714

4.  Development of the Everyday Conversational Sentences in Noise test.

Authors:  Kelly M Miles; Gitte Keidser; Katrina Freeston; Timothy Beechey; Virginia Best; Jörg M Buchholz
Journal:  J Acoust Soc Am       Date:  2020-03       Impact factor: 1.840

5.  Yes/no and two-interval forced-choice tasks with listener-based vs observer-based responses.

Authors:  Lori J Leibold; Emily Buss
Journal:  J Acoust Soc Am       Date:  2020-03       Impact factor: 1.840

6.  Output signal-to-noise ratio and speech perception in noise: effects of algorithm.

Authors:  Christi W Miller; Ruth A Bentler; Yu-Hsiang Wu; James Lewis; Kelly Tremblay
Journal:  Int J Audiol       Date:  2017-03-30       Impact factor: 2.117

7.  Central auditory processing deficits in schizophrenia: Effects of auditory-based cognitive training.

Authors:  Juan L Molina; Yash B Joshi; John A Nungaray; Michael L Thomas; Joyce Sprock; Peter E Clayson; Victoria A Sanchez; Mouna Attarha; Bruno Biagianti; Neal R Swerdlow; Gregory A Light
Journal:  Schizophr Res       Date:  2021-09-07       Impact factor: 4.662

8.  On Detectable and Meaningful Speech-Intelligibility Benefits.

Authors:  William M Whitmer; David McShefferty; Michael A Akeroyd
Journal:  Adv Exp Med Biol       Date:  2016       Impact factor: 2.622

9.  The motor system's [modest] contribution to speech perception.

Authors:  Ryan C Stokes; Jonathan H Venezia; Gregory Hickok
Journal:  Psychon Bull Rev       Date:  2019-08

10.  Contribution of Stimulus Variability to Word Recognition in Noise Versus Two-Talker Speech for School-Age Children and Adults.

Authors:  Emily Buss; Lauren Calandruccio; Jacob Oleson; Lori J Leibold
Journal:  Ear Hear       Date:  2021 Mar/Apr       Impact factor: 3.562

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.