| Literature DB >> 28738288 |
Romi Zäske1, Bashar Awwad Shiekh Hasan2, Pascal Belin3.
Abstract
Listeners can recognize newly learned voices from previously unheard utterances, suggesting the acquisition of high-level speech-invariant voice representations during learning. Using functional magnetic resonance imaging (fMRI) we investigated the anatomical basis underlying the acquisition of voice representations for unfamiliar speakers independent of speech, and their subsequent recognition among novel voices. Specifically, listeners studied voices of unfamiliar speakers uttering short sentences and subsequently classified studied and novel voices as "old" or "new" in a recognition test. To investigate "pure" voice learning, i.e., independent of sentence meaning, we presented German sentence stimuli to non-German speaking listeners. To disentangle stimulus-invariant and stimulus-dependent learning, during the test phase we contrasted a "same sentence" condition in which listeners heard speakers repeating the sentences from the preceding study phase, with a "different sentence" condition. Voice recognition performance was above chance in both conditions although, as expected, performance was higher for same than for different sentences. During study phases activity in the left inferior frontal gyrus (IFG) was related to subsequent voice recognition performance and same versus different sentence condition, suggesting an involvement of the left IFG in the interactive processing of speaker and speech information during learning. Importantly, at test reduced activation for voices correctly classified as "old" compared to "new" emerged in a network of brain areas including temporal voice areas (TVAs) of the right posterior superior temporal gyrus (pSTG), as well as the right inferior/middle frontal gyrus (IFG/MFG), the right medial frontal gyrus, and the left caudate. This effect of voice novelty did not interact with sentence condition, suggesting a role of temporal voice-selective areas and extra-temporal areas in the explicit recognition of learned voice identity, independent of speech content.Entities:
Keywords: Learning and recognition; Speech; TVA; Voice memory; fMRI
Mesh:
Year: 2017 PMID: 28738288 PMCID: PMC5576914 DOI: 10.1016/j.cortex.2017.06.005
Source DB: PubMed Journal: Cortex ISSN: 0010-9452 Impact factor: 4.027
Fig. 1(A) Six male and six female study voices were presented in separate runs. Each run consisted of 8 study-test cycles in which study speakers were repeated and subsequently tested. At test, participants performed old/new classifications for 12 voices (6 old/6 new). Half of the old and new speakers repeated the sentence from the preceding study phase (same sentence condition), the other half uttered a different sentence (different sentence condition). (B) Trial procedure for one study-test cycle. During the study phase each speaker uttered the same sentence twice in succession. The example shows two trials for the “different sentence condition”: one with an “old” test voice and one with a “new” test voice. Study and test trials were presented in random order.
Accuracies, sensitivity (d′), and response criteria (C) for sentence condition (same/diff), voice novelty condition (old/new), and cycle pairs with standard errors of the mean (SEM) in parentheses.
| Sentence condition | Cycle pair | Old voices (Hits) | New voices (CR) | d’ | C |
|---|---|---|---|---|---|
| Same | 1_2 | .79 (.02) | .43 (.04) | .68 (.11) | −.53 (.09) |
| 3_4 | .79 (.02) | .46 (.04) | .73 (.13) | −.50 (.07) | |
| 5_6 | .76 (.03) | .47 (.04) | .69 (.12) | −.44 (.08) | |
| 7_8 | .78 (.02) | .54 (.04) | .92 (.13) | −.35 (.07) | |
| Different | 1_2 | .54 (.04) | .55 (.03) | .23 (.11) | .02 (.08) |
| 3_4 | .60 (.04) | .57 (.03) | .49 (.13) | −.04 (.08) | |
| 5_6 | .60 (.04) | .56 (.03) | .45 (.09) | −.06 (.08) | |
| 7_8 | .60 (.04) | .53 (.04) | .37 (.10) | −.10 (.10) |
Fig. 2Voice recognition performance as reflected in (A) mean sensitivity d′ and (B) proportion correct responses depicted for sentence conditions and pairs of cycles (and voice novelty conditions). Error bars are standard errors of the mean (SEM).
Coordinates of local maxima (MNI space in mm) for BOLD-responses in study and test phases as revealed by the whole brain analyses. Significant effects were significant on the peak level (p < .001 [uncorrected]) and for the respective clusters (p < .05 [FWE] as listed here) and are reported for an extent threshold of 100 voxels. Cluster size reflects the number of voxels per cluster.
| Contrast | Cluster size | Z | x y z | Brain region | |
| sentence effect | |||||
| same > diff | n.s. | ||||
| same < diff | 225 | .005 | 4.64 | 42 −42 −22 | right FG |
| subs. voice memory | |||||
| subs. hits > misses | n.s. | ||||
| subs. hits < misses | n.s. | ||||
| novelty × sentence | 247 | .003 | 4.36 | −32 24 −6 | left IFG |
| sentence effect | |||||
| same > diff | n.s. | ||||
| same < diff | n.s. | ||||
| voice novelty effect | |||||
| hits > CR | n.s. | ||||
| hits < CR | 1225 | <.001 | 4.76 | 52 20 26 | right IFG/MFG |
| 250 | .002 | 4.30 | −16 14 8 | left caudate | |
| 563 | <.001 | 4.17 | 66 −18 4 | right STG | |
| 290 | <.001 | 4.09 | 4 20 54 | right area frontalis intermedia | |
| novelty × sentence | 138 | .033 | 3.93 | −38 2 2 | left insula |
| sentence effect | |||||
| same > diff | n.s. | ||||
| same < diff | n.s. | ||||
| voice novelty effect | |||||
| misses > FA | n.s. | ||||
| misses < FA | n.s. | ||||
| novelty × sentence | n.s. | ||||
Note that this voice novelty effect (hits < CR) in the right pSTG was the only significant effect in the ROI analyses of the TVAs. ROI analyses were performed analogous to the whole brain analyses.
Fig. 3(A) Whole brain analysis of test phases. Brain areas sensitive to voice novelty (hits < CR) irrespective of sentence condition in the right STG, right IFG/MFG, right medial frontal gyrus, and the left caudate. (B) ROI analysis of test phases in bilateral voice-sensitive areas. Reduced activity to studied voices (hits) compared to novel voices (CR) independent of speech content were observed in the right STG with no effect of sentence condition.