Literature DB >> 31354946

The effects of auditory spatial training on informational masking release in elderly listeners: a study protocol for a randomized clinical trial.

Farnoush Jarollahi¹, Marzieh Amiri¹, Shohreh Jalaie², Seyyed Jalal Sameni¹.

Abstract

Background: Regarding the strong auditory spatial plasticity capability of the central auditory system and the effect of short-term and long-term rehabilitation programs in elderly people, it seems that an auditory spatial training can help this population in informational masking release and better track speech in noisy environments. The main purposes of this study are developing an informational masking measurement test and an auditory spatial training program. Protocol: This study will be conducted in two parts. Part 1: develop and determine the validity of an informational masking measurement test by recruiting two groups of young (n=50) and old (n=50) participants with normal hearing who have no difficulty in understanding speech in noisy environments. Part 2 (clinical trial): two groups of 60-75-year-olds with normal hearing, who complain about difficulty in speech perception in noisy environments, will participate as control and intervention groups to examine the effect of auditory spatial training. Intervention: 15 sessions of auditory spatial training. The informational masking measurement test and Speech, Spatial and Qualities of Hearing Scale will be compared before intervention, immediately after intervention, and five weeks after intervention between the two groups. Discussion: Since auditory training programs do not deal with informational masking release, an auditory spatial training will be designed, aiming to improve hearing in noisy environments for elderly populations. Trial registration: Iranian Registry of Clinical Trials ( IRCT20190118042404N1) on 25 th February 2019.

Entities: CellLine Chemical Disease Gene Species

Keywords: Elderly; Energetic masking; Informational masking; Speech perception in noise; Train

Year: 2019 PMID： 31354946 PMCID： PMC6652096 DOI： 10.12688/f1000research.18602.2

Source DB: PubMed Journal: F1000Res ISSN： 2046-1402

Introduction

Many older adults complain about speech perception in noisy situations. It is clear that poorer speech recognition in the elderly population can occur due to various factors, including peripheral hearing impairment or decline in cognitive capabilities and processing defects at supra threshold levels. It is difficult to determine the exact role of each of these factors in developing speech problems in the elderly [1]. Most elderly individuals complain about the difficulty in understanding speech in noisy environments, despite having normal hearing thresholds [2]. The spectro-temporal overlap between target and competing speech, which leads to poor target identification, is called energetic masking [3]. Energetic masking is caused by physical interactions between target and competing speech [4, 5] at the low level of the peripheral auditory system [6]. Recent research has suggested that when competing signals occur randomly or when there is a high similarity between target and competing signals (for example, when both signals are speech), another type of masking occurs. This type of masking, which occurs in response to uncertainty of the competing signal or similarity between target and competing signals, is called non-energetic or informational masking [7, 8]. It leads to failure in selecting auditory objects and therefore impairs auditory scene analysis. Generally, in contrast with the energetic masking, which occurs due to the limitations caused by frequency selectivity at peripheral levels, informational masking reflects the processing capacity limitations at central auditory levels [9]. So, the type of background noise heavily influences the extent of the damage imposed to speech intelligibility [1]. Generally, hearing problems worsen in noisy environments when the target speech is covered by a competing signal. In this situation, in addition to energetic masking, informational masking also occurs [10]. Various studies have indicated that with an increase in age, side-effects of competing noises will increase [1, 10– 12]. Different studies have revealed that elderly populations, who do not have peripheral auditory impairment, suffer from diminished ability of using acoustic and phonetic signs to separate speech from background noise, compared to young people; therefore, more informational masking occurs in this population [10, 13]. Since problems with understanding speech will reduce social interactions of the elderly population, it is very important to develop effective auditory rehabilitation programs to prevent their isolation and to improve their quality of life [1]. Auditory spatial processing plays an important role in speech recognition in complex noisy environments [14], since it enables the listener to differentiate the target signal from competing signals via auditory scene analysis and forming auditory streams [15]. Based on the results of different studies, it has become clear that the most important sign of informational masking release is spatial separation of target and competing signals [16– 18]. In addition, it has been shown that auditory spatial processing ability is lower in the elderly with normal hearing than in young people. The reduction of localization accuracy and taking advantage of auditory spatial processing, consequently decreasing binaural processing, are not totally related to impaired hearing thresholds [19, 20]. Hence, it leads to poorer speech recognition in elderly people with normal hearing in noisy environments [14]. On the other hand, it has been shown that the elderly population need a higher signal to noise ratio for speech recognition in the presence of noise, compared to young people [11, 21]. These changes are possibly due to the reduction in the ability of using acoustic and phonetic signs to separate target signals from background noise [11]. Therefore, in elderly people, without considering the peripheral hearing impairment, the ability to use spatial and non-spatial signs for informational masking release diminishes due to the reduction of cognitive processing abilities [11, 12], temporal processing defects [10], defects in the connection between hemispheres, and diminished ability to separate simultaneous sounds [11]. Current neuroscientific studies have suggested that the central auditory system has a strong neuroplasticity capability for auditory spatial processing [22, 23] and since the effect of short-term and long-term auditory rehabilitation programs has been demonstrated in elderly people [2], it seems that by providing auditory spatial training, we can aid the elderly population to perform informational masking release, preventing them from missing conversations in noisy environments. The present study has two parts. The first part of this research will be conducted to develop a test for measuring and evaluating the informational masking. Between different researches on informational masking, coordinate response measure (CRM) sentences appear to be a good option to differentiate between energetic and informational masking [24]. By using these phrases as the target and competing signals, a high semantic and syntactic similarity occurs, which is an important factor in introducing informational masking [25]. Therefore, since the Persian version of these sentences are not available, we will prepare them and after that, we will develop a new informational measurement test. The second part of this research will be a clinical trial of an auditory spatial training program in elderly people with normal hearing, which would diminish informational masking. The main hypothesis for the second part of the study is that presenting an auditory spatial training for elderly people would be effective in the improvement of speech recognition in noisy environments by stimulating the centers related to binaural processing.

Protocol

This is version 2 of this protocol.

Study design, setting and participants

This research will be conducted in the Audiology Clinic of Rehabilitation School of Iran University of Medical Sciences. This study consists of two main parts. Part 1: develop and determine an informational masking measurement test and explain its validity characteristics in a test development study, conducted cross-sectionally. The study population will be a group of elderly (60 to 75 years old) and a group of young (20 to 40 years old) people. The young people will be recruited from rehabilitation students of Iran University of Medical Sciences, while elderly people are those referred to the audiology clinics of Iran University of Medical Sciences. Part 2 (simple randomized clinical trial): the effect of training on informational masking release. This part of study is a simple randomized clinical trial design and patients will be randomly assigned into two groups of control (not receiving auditory spatial training) and intervention (receiving auditory spatial training). The random allocation will be performed based on balanced randomization [1:1] where the allocation will be applied by random number table (those assigned an odd number, control group; those assigned an even number, intervention group). This allocation sequence will be generated by one of the audiology clinic staff of the IUMS who will not have any role in the study. An elderly population, 60–75-years-old, who are referred to the audiology clinics of Iran University of Medical Sciences will be selected. The two groups will be matched for age and gender. Those in the control group will not receive any rehabilitation programs during the study. Inclusion criteria (for all participants in the study): auditory thresholds ≤25dB within the 250–8000Hz frequency range for ensuring the normal pure tone audiogram or normal peripheral hearing, ensuring lack of salient cognitive problems using Mini Mental State Examination (MMSE) [26]; having diploma or higher degree; right-handedness (using Edinburgh handedness inventory); speaking Farsi and being monolingual; complaint about speech in noise perception difficulties (just for those in part 2 of the study); and normal condition of middle ear function. Exclusion criteria (for all participants in the study): unwillingness for participation in each step of research and not meeting inclusion criteria.

Study procedures

When studying the informational masking, use of the coordinate response measure (CRM) has been frequently been introduced as one of the most popular speech materials for evaluating informational masking [24]. In these sentences, the same rigid structure with “Ready [call sign] go to [color] [number] now” format is used. In these sentences, eight call signs, four colors, and eight numbers from 1 to 9 can be used. These sentences will be expressed by speakers of different genders [24, 25, 27]. In the present study, 256 sentences will be created for each speaker (8*4*8). Sentences will be expressed by eight speakers (four women and four men), providing a total of 2048 sentences. Although CRM stimuli have been initially designed to measure the speech perception in the presence of competing signals, these speech materials provide no contextual information; i.e. predicting the given color or number in the phrases is not possible. This is an important factor in measuring informational masking [25]. Since there is no Persian version of these sentences, this research will prepare the sentences and determine their content and face validity and reliability. Then, after selecting the nouns, colors, and numbers used in the sentence, conforming to the main English version, the prepared sentences will be given to experts in this field (audiologists, speech therapists, and linguistics) to determine the content validity. These experts are the academic members of rehabilitation schools of IUMS, Tehran University of Medical Sciences (TUMS) and Shahid Beheshti University of Medical Sciences (SBUMS). These experts will be emailed a questionnaire to score the validity items (see Table S1, Extended data). After selecting the best pattern matching Persian language, based on the model presented for recording the sentences, all sentences will be recorded in a studio with the eight speakers. In order to record the sentences, all criteria of the English version including the sampling rate of 44.1 kHz and giving 3s to speakers to produce each sentence will be followed. Then all the sentences will be scaled and all the words in CRM will be set such that they occur simultaneously, called coordinate sentences [25] where each of the sentences will be filtered using a band pass filter of 80 to 8000 Hz filter. Again, in order to determine the face validity, the recorded sentences will be given to the experts mentioned above to determine their suitability. They must fill the questionnaire, which will be emailed to them. To determine the reliability of CRM speech materials, the mean scores of CRM recognition in the silent will be evaluated in a group of young and old people with normal peripheral hearing who do not have speech perception difficulties in noisy environments. There will be one preliminary test and then a re-test. This evaluation will be implemented at the comfort hearing level of the participants. The score will be calculated based on the correct recognition percentage of the sentences. A sentence will receive a correct score when the color-number combination recognition is recognized correctly [25, 27]. In this study, the mean correct score for color, number, and noun will be studied separately in order to calculate the error percentage for each of them [24]. By preparing these sentences, they can be used in the part 2 of the study. The best way to measure informational masking score is determining the score for speech recognition in the presence of meaningful and meaningless competing noise. To this end, the recognition score for Persian version of CRM speech corpus will be measured under two conditions: A. In the presence of meaningful competing noise: The competing signal will be selected from the Persian version of CRM corpus, where the call sign, color and number used in the competing sentence will be different from those of the target sentence and it will be expressed by a different speaker. The individual will be trained to pay attention to certain target call signs and ignore other signals [24, 25, 27]. As one of the important effective factors for informational masking is the great similarity between target and competing signals (like when both of them are speech) [7, 8], using CRM sentences as both the target signal and competing signal, the high semantic and syntactic similarity would develop between target and competing signals [24, 25]. B. In the presence of meaningless competing signal: the previous signal will be manipulated such that its spectrum content remains fixed but meaningless - indeed, energetic masking will remain but informational masking will be reduced. "Time-reversed speech" will be used for this purpose. This is one of the most effective methods in behavioral and neurophysiological research performed for the effects of speech signals on each other. In this method, by fixing the long-term acoustic spectra of two signals and manipulating one of them such that it divides into non-overlapping time segments, and with reverse time presentation of each segment connecting them to each other, we will have a signal which is equal with the first signal in terms of the spectrum but is not understandable [28]. In the case of using 20–40 millisecond time windows, this method does not have significant effectiveness in non-understanding the speech signal; therefore, longer time windows should be used [28]. MATLAB R2018 software will be used in constructing this signal. The intensity setup of the speech signals with two competing talkers used were similar to some previous studies [29]. The overall level (RMS power) of target CRM was fixed at the 60 dBSPL. The overall level (RMS power) of each masker was adjusted relative to the target’s level to produce one target-to-masker ratios (TMRs). Initially the masker level was varied in 4 dB steps and then they varied adaptively in 2 dB steps. The two CRM masker phrases always had the same RMS power. The TMRs in sentence recognition test in steps A and B was ±8, ±4, and 0. The target signal is always presented from a loudspeaker in the 0-azimuth degree and two competing signals from the loudspeakers, which are at ±45 and ±90 degree and 0 azimuth degree (once with spatial separation and twice in the direction of target signal), where once the competing signal has the same gender as the target signal and another time has a different gender. As a result, 30 conditions will be evaluated at each step (5 TMRs and three spatial angles with two different genders). Finally, the informational masking score in all 30 conditions will be calculated as follows: Speech recognition score in the meaningful competing noise condition-speech recognition score at non-understandable noise condition=informational masking score (percentage). Jakien and Gallun provided mathematical equations by which the effects of age can be predicted for 45 degrees of separation [30]. We will develop similar equations, compare them with the published equations, and will use these normative functions to assess improvements in performance after training. In this step of the research, construct validity will be used to determine the validity of the test. For this purpose, Speech, Spatial, and Qualities of Hearing Scale (SSQ) questionnaire score of each individual will be compared against the informational masking score [31]. Figure S1 (Extended data) represents the participant’s timeline of the first part of the study. This part of research will be conducted in three steps: before auditory spatial training, during training, and after training. 1. Assessments prior to auditory spatial training (preliminary interview) -Obtain patient history to confirm the inclusion and exclusion criteria of the participants -Initial clinical examination, including otoscopy and tympanometry -Perform pure tone audiometry test -Perform MMSE questionnaire to ensure lack of salient cognition problem in the participants [26] -Determine speech perception difficulties in the presence of noise: this was evaluated with a question: Do you have difficulty in understanding speech in noisy situations? There were three response options: yes, no or sometimes. Those who responded yes were entered into the study. -Measure informational masking score using the test constructed in Part 1 (primary outcome) -Measure synthetic sentence identification test (secondary outcome) [32] -Determine the SSQ score [31]. The SSQ self-assessment questionnaire will be filled out by the researcher during the preliminary interview. As improving informational masking can improve the speech perception quality of people, this questionnaire will be used to measure the speech perception quality of the participants quantitatively (secondary outcome). - The temporary storage and manipulation of information required to perform a wide range of complex cognitive activities such as learning, and reasoning called working memory. Since the working memory have more influences on the informational masking tasks, so we will measure the ‘Persian Reading Span test’ score in our study [33]. Participants will asked to read sets of sentences, report on the semantic acceptability of each sentence, and then recall the final word of each sentence (secondary outcome). 2. Providing auditory spatial training (intervention group only) Auditory spatial training is designed based on five signs that are important in informational masking release: angular differentiation between target and competing signals [16, 34– 36]; signal to noise ratio [34]; similarity and difference between the target and competing signals [12]; similar or different gender for target and competing signals [12, 37]; and meaningfulness of the competing signal [12]. As one of the main principles of any auditory training program is progression in difficulty so the training sessions will be divided into three general steps by considering the competing signals. In the first step, meaningless competing signals will be used like white noise. In the second step, in order to make the training process somewhat difficult, meaning-carrying signals like speech babble consisting of four speakers will be used. Finally, for making more difficulty, sentence materials with male and female genders will be used. The reason for using the gender factor is that consideration of gender similarity or difference between target and competing signals is one of the signs that adults use for informational masking release [12, 37]. In all steps, the target signal will be presented from the loudspeaker at 0-azimuth degree and competing signals from different azimuth angles such as ±45, ±90 and 0 [38, 39]. Therefore, the difficulty of training will grow in each step by reducing the azimuth angle of the competing signal [40]. Sentence signals are selected from Persian version of QuickSIN [41]. Every step of training is implemented as follows: The intensity of the competing signal is fixed at 60 dBSPL, and at the beginning the intensity of target signals is 70 dBSPL. Three first sentences will be used for familiarization. If an individual needs more practice, more familiarization sentences will be provided. An individual will be requested to identify the keywords heard in the target sentences. In the case of true and false identification, required feedback will be provided. If the individual identifies more than 50% of the keywords, the sentence will be considered true. In this signal to noise ratio, 5 sentences will be provided where if the individual identifies more than 50% of the presented sentences, the signal to noise ratio decreases in 5dB steps, after which 5 sentences will be provided again for the individual. If the individual does not have the capability to correctly identify more than 50% of the presented sentences in each signal to noise ratio, the training begins where this signal to noise ratio will be considered as the initial level. The training will continue for 20 minutes and the intensity will change in an adaptive procedure, such that in the case of correct identification of a sentence, the intensity will be decreased by 1.5dB while it will be increased by 2.5 dB if the individual scored less than 50% of words correctly. At each intensity where an individual can correctly identify the sentences, the next sentence will be presented and the above process will repeat. The optimal condition for perceptual auditory learning includes active listening to high repetition of signals during the consecutive educational sessions, which is conducted within a short time interval. Since long-term training is not a very suitable option in the clinic [2], trainings will be repeated three times a week completed in 5-week cycles [42]. 3. Assessments after auditory spatial training (interview immediately after and five weeks after training) The informational masking test (as per Part 1) will be done immediately after training and five weeks after that using the Persian list of the coordinate response measure (CRM) corpus, which will be compared with the pre-training results (preliminary interview). This score will be the primary outcome. The reason for repeating experiments five weeks after the intervention is determining the reliability of the results obtained by intervention for informational masking release. The informational masking release value will be calculated based on the difference between sentence recognition score (in all 30 conditions of signal to noise, different spatial angles, and two genders) in both noise situations (meaningful and non-understandable). The changes in informational masking in the assessments will be calculated before and after the intervention across all 30 conditions (see Table S2, Extended data). The ‘synthetic sentence identification test’ [32] and the ‘Persian Reading Span test’ score [33].also will be evaluated immediately after training and 5 weeks after that. As the ultimate purpose of this research is improving the quality of speech perception of elderly people, the score of SSQ immediately and 5 weeks after intervention will be obtained and the results of both intervention and control groups will be compared separately. This score and the scores of ‘synthetic sentence identification test’ and ‘Persian reading span test’ will be the secondary outcomes of the interventions. Figure S2 (Extended data) represents participant’s timeline of the second part of the study.

Sample size

The study of Terwee et al. was considered as the basic study to determine the sample size of the first part of our study. They suggested that at least 50 patients in each group must be included to evaluate the construct validity [43]. In total, 50 young people aged between 20 and 40 years and 50 elderly people aged between 60 and 75 years, with normal peripheral hearing who do not suffer from speech understanding in noisy environments, will be recruited. The following formula is used to determine the sample size: n= (Z1−α/2+Z1−β)2(S12+S22)(μ1−μ2) S 1: standard deviation of the studied variable in the first group (case, exposed, or intervened) S 2: standard deviation of the studied variable in the second group (control, unexposed, or compared) µ 1: mean of the studied variable in the first group µ 2: mean of the studied variable in the second group α=0.05 β=80% Z= 1.96 In this formula, the studied variable is the extent of informational masking changes before and after the intervention. There is no previous study, which was used the same training as this study proposes; therefore, we considered the study of Delphi et al. which was on a group of elderly individuals [44]. In her study the mean and standard deviation of the improvement in the main variable were 37.5 and 25.17 in the experimental group and 25.17 and 18.15 in the control group, respectively. According to this, the sample size calculated 14 in each group. By assuming 20% drop out, so we will use 18 people in each group.

Data analysis

Central tendency and dispersion indices (mean and standard deviation) will be used in descriptive analysis of data. In data analysis and for determining the reliability of the Persian version of the coordinate response measure (CRM) corpus, paired t-test and Pearson correlation will be used in the case of normality of data; otherwise, Spearman test will be employed, and one-way ANOVA will be utilized for determining intra-class correlation coefficient. In part 2 of the research, repeated measures ANOVA values will be used for inter-group comparisons and two-way ANOVA will be employed for comparison among groups. SPSS software (V20.0, IBM Corporation, New York, USA) will be used for statistical data analysis and the significance level for all tests will be 0.05.

Ethical statement and consent to participate

The Medical Ethics Committee of Iran University of medical sciences approved the study protocol (IR.IUMS.REC.1397.303) and the ethical principles of the ethics committee will be observed in this research. Researchers will send any amendments to the protocol in the future to the ethics committee. One of the researchers of this study will obtain written informed consent from patients willing to participate in the trial (see Extended data). The purpose of the research and its steps will be explained for all participants before the study start. Confidentiality of data and results of tests will be ensured to participants. Participants will be made aware that they can refrain from cooperation in the study when they want. Conducting tests has no side-effects for the studied individuals and all tests and training sessions will be without cost to the participants. Since this will be the first time of performing this training, we will running a focus group and asking our participants what they will think of the intervention and how likely they will continue performing the auditory training in the future. To promote participant retention and complete follow-up, in every training session the examiner will provide feedback to all participants and will inform them about the training progress. The researcher will ask them about the impact of training on the participant’s daily communication conversations. Also at the first of each training session the initial level of training will be calculated and if there will be a progress in the initial signal to noise ratio which train will be started from that, the examiner will inform the patient. All data will be entered into forms which are prepared for data collection (see Table S2; Extended data) and the participant files will be stored at study site and will be maintained in a secure place and manner. Participant files will be maintained in storage for a period of 2 years after completion of the study. Only Principal Investigators will be given access to the study data.

Dissemination

The study outcomes will be published through peer-reviewed journals. The data resulting from this study will be released to the audiologists and participants and the general medical community. The results of this trial will be communicated to the external funding body through a formal report. There is no limit in the publication of the trial results.

Study status

The study started in December 2018 and will continue until December 2019. To date, the enrolment of the patients has been performed and the allocation will be performed in the near future.

Discussion

Since there is a progressive increase in elderly populations around the world, the independence of this age group has gained much attention, and Iran is no exception [45]. One of the most important points in independent life during aging is the capability for effective verbal communication. Unfortunately, this capability declines in elderly people, especially tracking speech in environments where several speakers talk with each other. Most elderly individuals complain that despite good hearing, they cannot understand speech in noisy environments [1]. Indeed, elderly people cannot use auditory spatial signs for informational masking release due to the reduction of their auditory processing and cognitive abilities [11, 12]. Since informational masking has an important role in competing signal environments and rehabilitation programs have not considered this an important aspect of masking, designing training that can help elderly people in releasing this masking is novel. Therefore, if the main research hypothesis, i.e. auditory spatial training can improve informational masking release in the elderly people, is confirmed, by providing the therapeutic solution in this age group, a series of auditory spatial trainings based on informational masking release will be provided for audiologists. In addition, the Persian version of the coordinate response measure (CRM) corpus and its reliability and validity will be calculated to be used in research on speech recognition in noisy environments.

Data availability

Underlying data

No data is associated with this article.

Extended data

Open Science Framework: The effects of auditory spatial training on informational masking release in elderly listeners: a study protocol for a randomized clinical trial. https://doi.org/10.17605/OSF.IO/SDEJP [46]. This project contains the following extended data: Table S1: The questionnaire of content and face validity of Persian version of coordinate response measure (CRM) corpus Table S2: The data collection sheet for the second part of the study Figure S1: Participant timeline for part 1 of the study (Developing an informational masking measurement test and determining its validity) Figure S2: Participant timeline for part 2 of the study (the effect of auditory spatial training on the informational masking release) Informed consent form for the participants of the first part of the study Informed consent form for the participants of the second part of the study Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).

Reporting guidelines

Open Science Framework: SPIRIT checklist for ‘The effects of auditory spatial training on informational masking release in elderly listeners: a study protocol for a randomized clinical trial’: https://doi.org/10.17605/OSF.IO/SDEJP [46]. Authors have addressed all concerns. We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. Most comments are appropriately addressed by the authors. Please see minor comments below: The aims in the last paragraph of introduction should be revised, it is not very clear that the target population is elderly people with normal pure tone audiogram who complain of speech in noise difficulties. The title should be revised too:'... elderly listeners with speech in noise difficulty.." Citation is needed for the following sentence: 'In addition, it has been shown that auditory spatial processing ability is lower in the elderly with normal hearing than in young people.' 'Normal hearing' in the text should be changed to 'normal pure tone audiogram'. 'auditory thresholds ≤25dB within the 250–8000Hz frequency range for ensuring the normal pure tone audiogram' as per ASHA protocol? 'normal peripheral hearing' in the inclusion criteria section should be removed. 'complaint about speech in noise perception difficulties' as judged by SSQ? I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. This study has two aims: first developing and validating a test to measure informational masking and second to conduct a randomised clinical trial to evaluate the effectiveness of an auditory training in elderly people who have speech in noise difficulty. The introduction is well-written, and the aims are clearly stated however the methodology needs to be revised. The detailed comments are as below: Introduction First Paragraph ‘Understanding speech in noisy environments is a major challenge of the auditory system, which occurs mostly due to aging.’ Please revise this sentence. There are many reasons for difficulty understanding speech in background noise, aging is one of them, it doesn’t occur mostly due to aging. Please first define/explain what energetic and informational masking are (please re-organise your paragraphs, the definition should come first) Second paragraph ‘…, another type of masking occurs both.’ Please revise this sentence, it doesn’t make sense. Fourth paragraph ‘Based on the results of different studies, it has become clear that the most important sign of informational masking release is spatial separation of target and competing signals’. Here you’ve mentioned several studies but only used one reference, either revise the sentence or add more references. Fifth paragraph ‘On the other hand, it has been shown that the elderly population need a higher signal to noise ratio for speech recognition in the presence of noise, compared to young people.’ Please add a reference. ‘in elderly people, without considering the hearing impairment, the ability to use spatial and non-spatial signs for informational masking release diminishes due to the reduction of cognitive processing abilities, temporal processing, defects in the connection between hemispheres, and diminished ability to separate simultaneous sounds.’ By ‘hearing impairment’ do you mean apparent hearing loss on the PTA? Hearing impairment includes temporal (and auditory) processing deficits as well as peripheral hearing impairment. Last paragraph ‘The present study had two parts. The first part of this research was developing a test for measuring and evaluating the informational masking.’ The sentences are in the past, have you done the study already or you are going to conduct these? In the study design section, you mention ‘the study will be conducted...’ Please be consistent. Although you have clarified this in your methodology, your aim will be clearer if you specify here why you are developing and validating a test to measure informational masking. Study design ‘Inclusion criteria (for all participants in the study): auditory thresholds ≤25dB within the 250–2000Hz frequency range and …’ My main concern is the definition of ‘normal hearing’. Which protocol have you used to define ‘normal hearing’? if you are including patients with mild high frequency hearing loss you cannot say patient’s hearing is normal. Even if all frequencies (250-8000 Hz) are better than 20 or 25 dBHL (BSA or ASHA definition) if the OAEs are not robust or wave I of the ABR is absent you cannot assume patient’s hearing is normal. Pure tone audiogram is very limited when testing the auditory pathway. In addition, you need to differentiate between peripheral and central hearing. If you would like to use the term normal hearing, it’s better to say ‘peripheral hearing’ as you have not tested the central auditory pathway so you cannot assume their hearing is ‘normal’. The prevalence of hearing impairment (peripheral and central) in elderly people is quite high. Also it is perfectly possible that an elderly patient has a spatial processing disorder in the presence of 'normal pure tone audiogram'. SSQ is not a quality of life questionnaire, please use another term and revise this paragraph. Please justify why you chose 4 speakers. Sample size Please specify what Z in the sample size formula is and include means and standard deviations from the Delphi’s study. The sample size for study 2 is not adequately justified. Sixteen participants seems a very low number, particularly if we assume 20% drop out. How many participants are you going to approach? Other comments Sufficient time must be considered for auditory training, one month seems a very short time to assume auditory training is beneficial or not (in Humes et al’s methodology the training was performed twice a day for 7.5 weeks but you are proposing 2 trainings in 4 weeks) , ideally you should repeat your outcome measures after 3 and 6 months, and even better if you could repeat 1 month after the training is ended to explore the long term potentiation. Feedback and monitoring are crucial in auditory training, you have mentioned about the feedback but please explain in detail how this will be done. There was no mention of progression in difficulty of the training (one of the main principles of auditory training), are you going to consider this in your training? Since you have not done any pilot or feasibility study, it would be useful if you could do a small qualitative study; for example, running a focus group and asking your participants what they think of the intervention and how likely they will continue performing the auditory training in the future. One of the major issues with the auditory training (or any other training) is boredom and generally keeping the patient motivated throughout. In addition, in practice, performing the training in the clinic is time consuming and costly. Would patients do these training at home? Adhering to the training while performing at home is another issue. The consent form needs to be written in lay language and please avoid jargon and acronyms. I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. Dear Dr, Koohi We appreciate you for your complimentary comments and suggestions. The followings are our point-by-point responses: 1 Your comments about ‘ Introduction’ part were: First paragraph: please revise this sentence:’ Understanding speech in noisy environments is a major challenge of the auditory system, which occurs mostly due to aging.’ And please first define/explain what energetic and informational masking are (please re-organize your paragraphs, the definition should come first) Response: As suggested by you, we have revised the first sentence to: ‘Many older adults complain about speech perception in noisy situations.’ And we also re-organize the paragraph according to your comment. Second paragraph: Another type of masking occurs both.’ Please revise this sentence, it doesn’t make sense. Response: We are sorry that this part was not clear in the original manuscript. We have revised the contents of this part. Fourth paragraph: ‘Based on the results of different studies, it has become clear that the most important sign of informational masking release is spatial separation of target and competing signals’. Here you’ve mentioned several studies but only used one reference, either revise the sentence or add more references. Response: we have added more references. Fifth paragraph: ‘On the other hand, it has been shown that the elderly population need a higher signal to noise ratio for speech recognition in the presence of noise, compared to young people.’ Please add a reference. Response : we have added references. ‘in elderly people, without considering the hearing impairment, the ability to use spatial and non-spatial signs for informational masking release diminishes due to the reduction of cognitive processing abilities, temporal processing, defects in the connection between hemispheres, and diminished ability to separate simultaneous sounds.’ By ‘hearing impairment’ do you mean apparent hearing loss on the PTA? Hearing impairment includes temporal (and auditory) processing deficits as well as peripheral hearing impairment. Response: Your comment is totally true. We have mentioned ‘peripheral hearing impairment’ in the sentences. Last paragraph: ‘The present study had two parts. The first part of this research was developing a test for measuring and evaluating the informational masking.’ The sentences are in the past, have you done the study already or you are going to conduct these? In the study design section, you mention ‘the study will be conducted...’ Please be consistent. Response: We have corrected the sentences. Although you have clarified this in your methodology, your aim will be clearer if you specify here why you are developing and validating a test to measure informational masking. Response: We have explained this and added more sentences to this part. 2 Your comments about the ‘ study design’: ‘ Inclusion criteria: You need to differentiate between peripheral and central hearing. If you would like to use the term normal hearing, it’s better to say ‘peripheral hearing’ as you have not tested the central auditory pathway so you cannot assume their hearing is ‘normal’. Response: We have revised this criteria accordingly to: “Auditory thresholds ≤25dB within the 250–8000Hz frequency range for ensuring the normal pure tone audiogram or normal peripheral hearing,” SSQ is not a quality of life questionnaire, please use another term and revise this paragraph. Response: We have revised this sentence as bellow: ‘As improving informational masking can improve the speech perception quality of people, this questionnaire will be used to measure the speech perception quality of the participants quantitatively (secondary outcome).’ Please justify why you chose 4 speakers. Response: We have corrected this part and we will use eight speakers for recording the CRM phrases. 3 Your comments about the ‘ sample size: Please specify what Z in the sample size formula is and include means and standard deviations from the Delphi’s study. The sample size for study 2 is not adequately justified. Sixteen participants seems a very low number, particularly if we assume 20% drop out. How many participants are you going to approach? Response: We have included all you mentioned in the revised version. 4 Other comments you mentioned were: 1) Training period which have corrected as bellow ‘The optimal condition for perceptual auditory learning includes active listening to high repetition of signals during the consecutive educational sessions, which is conducted within a short time interval. Since long-term training is not a very suitable option in the, trainings are repeated three times a week completed in 5-week cycles 2 ) Feedback and monitoring are crucial in auditory training, you have mentioned about the feedback but please explain in detail how this will be done. Response: We have included your comment to the text. 3) There was no mention of progression in difficulty of the training (one of the main principles of auditory training), are you going to consider this in your training? Response: We have added more explanation to this part. As bellow. ‘As one of the main principles of any auditory training program is progression in difficulty so the training sessions will be divided into three general steps by considering the competing signals. In the first step, meaningless competing signals will be used like white noise. In the second step, in order to make the training process somewhat difficult, meaning-carrying signals like speech babble consisting of four speakers will be used. Finally, for making more difficulty, sentence materials with male and female genders will be used. The reason for using the gender factor is that consideration of gender similarity or difference between target and competing signals is one of the signs that adults use for informational masking release.’ 4) Since you have not done any pilot or feasibility study, it would be useful if you could do a small qualitative study; for example, running a focus group and asking your participants what they think of the intervention and how likely they will continue performing the auditory training in the future. Response: You have mentioned a very good point and we have added your comment to the text. You can find it in the third paragraph of the ‘Ethical statement and consent to participate’ part. 5) Would patients do these training at home? Response: Because we will design our study with loudspeakers it will be not done at home. But this study is kind of a pilot study and if this training will have good results maybe it would be done in the future and under headphones. 6) The consent form needs to be written in lay language and please avoid jargon and acronyms. Response: We have revised it. Overall, we think it is an interesting study and the goal of creating a Persian version of the CRM and spatial release tests is a good idea. However, the methods for developing the new test create opportunities for confounds and difficulties interpreting the data. Specific comments are listed below. In addition, the training arm is not sufficiently different from the new informational masking for it to be a clinically reliable outcome measure. It would be appropriate to have another task (such as dichotic sentence identification (or equivalent test that has already been validated in the Persian language)) that wasn't trained as an outcome. Furthermore, the lack of an active control condition raises the possibility that any activity involving remembering and responding to stimuli (or even just coming into the testing environment) would change performance. For detailed discussion of the training issues, the reader is encouraged to consult Green et al. (2019) [1]. Additional fundamental methodological concerns involve the failure to consider the working memory and attention influences that are known to be important for speech in noise for older listeners (Fullgrabe et al., 2015) [2] and for all listeners for informational masking tasks, especially the CRM tasks. With the regard to the new CRM test, it is excellent to create a Persian version, but there are several differences in these methods that could result in substantial differences in the outcomes. The step size of 5 dB is too large, given the differences in performance that are usually observed. It might be acceptable if a psychometric function were being fit to the data, but the statistical analysis proposed is unlikely to be sufficiently sensitive to the small changes that this method is able to detect, using the standard methods. A more appropriate measure is the target-to-masker ratio at which a fixed level of performance is obtained. For examples of the differences in target-to-masker ratio commonly observed in older and younger listeners with the English version for same and different-gender targets and maskers, the reader is directed to Marrone et al. (2008b) [3] and Gallun et al. (2013) [4]. The use of two male and two female speakers is not sufficient to ensure that the specific speaking styles of the talkers are not influencing the results. English CRM uses four of each gender, and the studies from our lab exclude one of the males due to differences in rate of speaking. Time alignment of the keywords is an essential aspect of creating an informational masking situation where spatial cues can provide large release from masking. Temporal overlap should be carefully examined, and preliminary testing should establish that none of the talkers is more intelligible than the others in the conditions to be tested. The use of 90 degrees of spatial separation is large enough that it is possible that changes in spatial ability will not be detected. Multiple studies have established that the difference between 30 and 45 degrees of separation is not very large (Marrone et al., (2008a) [5]; Jakien et al., (2017) [6]) and that for people with normal hearing, the effects of age are difficult to detect with separations greater than 15 degrees (Srinivasan et al., (2016) [7]). For this study, this is very important, because if the effects of the training are to improve the ability to use (or even perceive) spatial differences, it is unlikely that this will be a very large change, and so if only very large separations are used, there may be no way to observe the improvements in performance. Jakien and Gallun (2018) [8] provided mathematical equations by which the effects of age can be predicted for 45 degrees of separation. It would be useful to develop similar equations in Aim 1, compare them with the published equations, and use these normative functions to assess improvements in performance after training. This report was written by Frederick Gallun with advice from Aaron Seitz. We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above. Reviewer 1 Dear Dr, Gallun, We appreciate you for your complimentary comments and suggestions. The followings are our point-by-point responses: 1 Your concerns about CRM were as bellow: 1) The step size and use a target-to-masker ratio. Response: We have revised this part according to Marrone et al.’s study. You can find it in the revised manuscript (Study procedure, Part1, the last paragraph). 2) The number of speakers: Response: you are totally right and we will use eight speakers for recording the CRM phrases and we have revised that in the manuscript. 3) The use of 90 degrees of spatial separation is large enough that it is possible that changes in spatial ability will not be detected. Response: Your comment is very useful. We have added the ±45 degree to the study and we have revised the methodology according to your concerns. We also will use the mathematical equations which you mentioned to evaluate the effects of age. 2 Your concerns about training were as bellow: 1 The training arm is not sufficiently different from the new informational masking for it to be a clinically reliable outcome measure. It would be appropriate to have another task (such as dichotic sentence identification (or equivalent test that has already been validated in the Persian language)) that wasn't trained as an outcome. Response: We also have added the ‘ Farsi language version of synthetic sentence identification test’ in pre and post- training parts of the study. We must mentioned that the ‘dichotic sentence identification test’ have not been converted to Persian. The original manuscript was revised accordingly. 2 Furthermore, the lack of an active control condition raises the possibility that any activity involving remembering and responding to stimuli (or even just coming into the testing environment) would change performance. Response: This condition is the same for both groups. So we assumed that both of them may learn or remember the tests items. So the differences between them will be diminished. Also as you know the CRM phrases are contextual-free and it is a good point of them. It means that being in the test environment have not any large effect on scoring the test and anyone who participated in this study must just rely on her or his audibility to identify the correct color and number. 3 Failure to consider the working memory and attention influences. Response: As you correctly concerned about the working memory effects, we have added a ‘ Persian Reading Span test’ to evaluate the working memory ability of the participants. We will try to match people of both groups according to their scores on this test and the other tests which mentioned in the text in pre and post training situations. We must mentioned that we also use the MMSE questionnaire to include the persons who have no salient cognition impairment.

45 in total

The effects of auditory spatial training on informational masking release in elderly listeners: a study protocol for a randomized clinical trial.

Introduction

Protocol

Study design, setting and participants

Study procedures

Sample size

Data analysis

Ethical statement and consent to participate

Dissemination

Study status

Discussion

Data availability

Underlying data

Extended data

Reporting guidelines

1. Temporal processing in the aging auditory system.

2. Informational and energetic masking effects in the perception of two simultaneous talkers.

3. A speech corpus for multitalker communications research.

4. The role of perceived spatial separation in the unmasking of speech.

5. Informational and energetic masking effects in the perception of multiple simultaneous talkers.

6. Models of plasticity in spatial auditory processing.

7. The effects of spatial separation in distance on the informational and energetic masking of a nearby speech signal.

8. Note on informational masking.

9. Informational masking in normal-hearing and hearing-impaired listeners.

10. The Speech, Spatial and Qualities of Hearing Scale (SSQ).