| Literature DB >> 35297089 |
Michael Henry Tessler1,2, Noah D Goodman2,3.
Abstract
The meanings of natural language utterances depend heavily on context. Yet, what counts as context is often only implicit in conversation. The utterance it's warm outside signals that the temperature outside is relatively high, but the temperature could be high relative to a number of different comparison classes: other days of the year, other weeks, other seasons, etc. Theories of context sensitivity in language agree that the comparison class is a crucial variable for understanding meaning, but little is known about how a listener decides upon the comparison class. Using the case study of gradable adjectives (e.g., warm), we extend a Bayesian model of pragmatic inference to reason flexibly about the comparison class and test its qualitative predictions in a large-scale free-production experiment. We find that human listeners infer the comparison class by reasoning about the kinds of observations that would be remarkable enough for a speaker to mention, given the speaker and listener's shared knowledge of the world. Further, we quantitatively synthesize the model and data using Bayesian data analysis, which reveals that usage frequency and a preference for basic-level categories are two main factors in comparison class inference. This work presents new data and reveals the mechanisms by which human listeners recover the relevant aspects of context when understanding language.Entities:
Keywords: Bayesian cognitive model; Bayesian data analysis; Comparison class; Context | Adjectives; Pragmatics; Rational Speech Act; Reference class
Mesh:
Year: 2022 PMID: 35297089 PMCID: PMC9286384 DOI: 10.1111/cogs.13095
Source DB: PubMed Journal: Cogn Sci ISSN: 0364-0213
Fig 1Model overview for the example of a listener hearing a speaker describe a basketball player as tall. (A) A hypothesis space of comparison classes is constructed over a taxonomic hierarchy. (B) A comparison class is realized as a probability distribution over the relevant degree (e.g., height; shown in black). Context‐specific probabilistic interpretations of the gradable adjective tall—given by —are shown in red for the different comparison classes (facets). (C) Listener imagines what a speaker would say given different possible heights of the referent (x‐axis) assuming different comparison classes (facets); since the listener knows the referent is a basketball player, though, heights towards the upper range of the scale are a priori more likely (opacity). (D) Marginalizing over the a priori plausible heights of the referent, the speaker has a preference to say tall over short if the comparison class is other people. (E) The pragmatic listener inverts this speaker model to infer that people is the more likely comparison class given that the speaker said tall. If the referent is described as short, however, the listener infers the speaker meant short for a basketball player. Schematic prior distribution of heights for people is a unit normal distribution , and the heights of basketball players is a right‐shifted normal with smaller variance .
Fig 2Predictions of the alternative model of a literal listener that does not represent a speaker's representation of the context as separate from their own. This listener effectively answers the question of what is more likely: a basketball player who is tall or a person who is tall?
Thirty of the ninety sets of adjectives and categories used in the comparison class inference experiment. Categories were curated from a set of empirically elicited noun phrases from a stimulus generation task (see the SI)
| Adjectives (Scale) | Example Subordinate Classes (Superordinate Class) |
|---|---|
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
|
Fig 3Comparison class inference experimental results. Proportion of paraphrases that contained the subordinate NPs (e.g., basketball player) with which the referent was introduced, as a function of the general expectations (background knowledge) listeners have about the category (x‐axis; e.g., gymnasts = low, basketball players = high) and the polarity of the adjective used to describe the category (color; e.g., tall = positive, short = negative). Bars represent overall means, and error bar is a bootstrapped 95% confidence interval. Each dot represents the mean of a single item and lines connect subordinate NPs described with different adjectives (e.g., tall and short basketball player). Dots are jittered horizontally to improve visual clarity.
Fig 4Comparison class inference results for 24 of the 90 item sets. Two examples were selected from each unique degree scale; example items were those that exhibited the greatest and smallest variability in comparison class inferences.
Fig 5Model fits for main experiment (comparison class inference; top row) and norming experiment (adjective endorsement; bottom row) for four models that differ in their parameterization of the comparison class prior used inside the comparison class inference RSA model. The flat prior model assumes all comparison classes are equally likely a priori. The basic‐level bias model assumes that there is a preference for a basic‐level comparison class. The frequency effect assumes the prior probability of a comparison class tracks the frequency of the NP in a corpus. Basic‐level and frequency effect assumes that the prior probability of a comparison class is a function of a basic‐level bias and frequency. Dots represents means of the human judgments (proportion of subordinate‐NP responses for comparison class inference experiment; proportion endorsement for norming study) and the maximum a posteriori estimate of the model's predictions. Lines represent bootstrapped 95% confidence intervals for the data and 95% Bayesian credible intervals for the models.
Fig 6Quantitative modeling results for 10 sets of items. (See SI for full data‐analytic model.) A: Human comparison class inference data and model predictions for ten items. The log ratio corpus frequency of subordinate NP to superordinate NP (used in models with frequency‐effect) is shown in brackets next to the x‐axis labels; a more negative number corresponds to a stronger prior belief in the superordinate category as the comparison class. B: Imputed prior distributions over degrees for ten items. Distributions were generated from the Maximum A‐Posteriori parameter values inferred by conditioning on the Comparison Class Inference and Adjective Endorsement data sets. Superordinate‐level category distributions (e.g., heights of people) are assumed to be Unit Normal distributions for all item sets.
Model evaluation results. Full basic‐level and frequency model exhibits the best fit to both datasets in terms of variance explained and mean squared error (MSE). (log) Bayes factor are shown with respect to the full model (i.e., negative numbers indicate positive evidence for the full basic‐level and frequency model)
| Model |
|
|
|
| log BF |
|---|---|---|---|---|---|
| Flat prior | 0.136 | 0.0712 | 0.987 | 0.0032 | −2,569 |
| Frequency effect | 0.222 | 0.069 | 0.98 | 0.0058 | −3,857 |
| Basic‐level bias | 0.715 | 0.0212 | 0.989 | 0.0028 | −215 |
| Basic‐level and frequency | 0.769 | 0.0171 | 0.989 | 0.0028 | 0 |
Fig 7Imputed distributions over the prior probabilities of comparison classes at different levels of abstraction. Basic‐ and subordinate‐level categories comprise a priori likely comparison classes, while superordinate categories are less likely to serve as comparison classes.