| Literature DB >> 35548492 |
Fabian Tomaschek1, Michael Ramscar1.
Abstract
The uncertainty associated with paradigmatic families has been shown to correlate with their phonetic characteristics in speech, suggesting that representations of complex sublexical relations between words are part of speaker knowledge. To better understand this, recent studies have used two-layer neural network models to examine the way paradigmatic uncertainty emerges in learning. However, to date this work has largely ignored the way choices about the representation of inflectional and grammatical functions (IFS) in models strongly influence what they subsequently learn. To explore the consequences of this, we investigate how representations of IFS in the input-output structures of learning models affect the capacity of uncertainty estimates derived from them to account for phonetic variability in speech. Specifically, we examine whether IFS are best represented as outputs to neural networks (as in previous studies) or as inputs by building models that embody both choices and examining their capacity to account for uncertainty effects in the formant trajectories of word final [ɐ], which in German discriminates around sixty different IFS. Overall, we find that formants are enhanced as the uncertainty associated with IFS decreases. This result dovetails with a growing number of studies of morphological and inflectional families that have shown that enhancement is associated with lower uncertainty in context. Importantly, we also find that in models where IFS serve as inputs-as our theoretical analysis suggests they ought to-its uncertainty measures provide better fits to the empirical variance observed in [ɐ] formants than models where IFS serve as outputs. This supports our suggestion that IFS serve as cognitive cues during speech production, and should be treated as such in modeling. It is also consistent with the idea that when IFS serve as inputs to a learning network. This maintains the distinction between those parts of the network that represent message and those that represent signal. We conclude by describing how maintaining a "signal-message-uncertainty distinction" can allow us to reconcile a range of apparently contradictory findings about the relationship between articulation and uncertainty in context.Entities:
Keywords: context; cue-to-outcome structure; discriminative learning; enhancement; linguistic knowledge; morphological structure; phonetic characteristics; reduction
Year: 2022 PMID: 35548492 PMCID: PMC9083257 DOI: 10.3389/fpsyg.2022.754395
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1The possible predictive relationships labels (in morphological terms, series of words and affixes) can enter into with the other features of the world (or other elements of a code). A feature-to-label relationship (A) will facilitate cue competition between features, and the abstraction of the informative dimensions that predict morphological contrasts (e.g., nouns and plural affixes) in learning. By contrast, a label-to-feature relationship (B) will be constrained to simply learning the probability of each feature given the label.
Figure 2ML-score difference between model m0 and models m1 to m4. The larger the difference, the better the model's goodness of fit.
Summary of the statistical models using functional input activation and functional output activation as a predictor of formant trajectories.
|
|
|
|
| |
|---|---|---|---|---|
|
| ||||
|
| ||||
| s(functional input activation):dimension = F1 | 3.7482 | 3.9577 | 39.0716 | <0.0001 |
| s(functional input activation):dimension = F2 | 3.2589 | 3.7180 | 44.7998 | <0.0001 |
| ti(time,functional input activation):dimension = F1 | 7.6804 | 9.7289 | 2.3730 | 0.0079 |
| ti(time,functional input activation):dimension = F2 | 4.6737 | 5.8764 | 4.3388 | 0.0002 |
|
| ||||
| s(functional input activation):dimension = F1 | 3.3729 | 3.7829 | 10.0274 | <0.0001 |
| s(functional input activation):dimension = F2 | 3.8460 | 3.9845 | 94.0980 | <0.0001 |
| ti(time,functional input activation):dimension = F1 | 10.4838 | 12.7473 | 5.2548 | <0.0001 |
| ti(time,functional input activation):dimension = F2 | 7.7378 | 9.4625 | 14.1532 | <0.0001 |
|
| ||||
| s(functional input activation):dimension = F1 | 3.8933 | 3.9921 | 20.9012 | <0.0001 |
| s(functional input activation):dimension = F2 | 3.7390 | 3.9562 | 27.3247 | <0.0001 |
| ti(time,functional input activation):dimension = F1 | 7.3833 | 9.7127 | 1.4650 | 0.1497 |
| ti(time,functional input activation):dimension = F2 | 10.8514 | 12.8554 | 4.6229 | <0.0001 |
|
| ||||
|
| ||||
| s(functional output activation):dimension = F1 | 1.0020 | 1.0038 | 115.2282 | <0.0001 |
| s(functional output activation):dimension = F2 | 3.8720 | 3.9862 | 12.4216 | <0.0001 |
| ti(time,functional output activation):dimension = F1 | 4.6471 | 6.5973 | 0.5412 | 0.7934 |
| ti(time,functional output activation):dimension = F2 | 3.6281 | 4.2364 | 6.7708 | <0.0001 |
|
| ||||
| s(functional output activation):dimension = F1 | 3.7248 | 3.9528 | 5.1275 | 0.0011 |
| s(functional output activation):dimension = F2 | 3.9479 | 3.9976 | 106.9967 | <0.0001 |
| ti(time,functional output activation):dimension = F1 | 9.6920 | 12.3538 | 3.8719 | <0.0001 |
| ti(time,functional output activation):dimension = F2 | 9.9943 | 12.3734 | 8.4570 | <0.0001 |
|
| ||||
| s(functional output activation):dimension = F1 | 3.2277 | 3.6812 | 21.2965 | <0.0001 |
| s(functional output activation):dimension = F2 | 3.9082 | 3.9942 | 39.8523 | <0.0001 |
| ti(time,functional output activation):dimension = F1 | 8.0654 | 9.6352 | 8.6645 | <0.0001 |
| ti(time,functional output activation):dimension = F2 | 4.9361 | 6.8905 | 3.0888 | 0.0027 |
Summaries of control variables and random effect structure can be found in the .
Figure 3Estimated trajectories for different word classes (columns) in relation to vowel duration (top), functional output activation obtained from a network with inflectional functions of [ɐ] in the output (middle) and functional input activation obtained from a network with inflectional functions of [ɐ] in the input (bottom). The x-axes represent inverted z-scaled F2 frequencies such that the left edge points toward the front of the vowel space and the right edge points toward the back of the vowel space. Y-axes represent inverted z-scaled F1 frequencies such that the top points to the top of the vowels space and the bottom points toward the bottom of the vowel space. Shades of red represent percentiles for different predictors (optimized for color blindness). Onset of the time course is located at the filled star, the circle in the trajectory represents the center of the vowel.