Literature DB >> 35587112

Verb Metaphoric Extension Under Semantic Strain.

Abstract

This paper explores the processes underlying verb metaphoric extension. Work on metaphor processing has largely focused on noun metaphor, despite evidence that verb metaphor is more common. Across three experiments, we collected paraphrases of simple intransitive sentences varying in semantic strain-for example, The motor complained → The engine made strange noises-and assessed the degree of meaning change for the noun and the verb. We developed a novel methodology for this assessment using word2vec. In Experiments 1 and 2, we found that (a) under semantic strain, verb meanings were more likely to be adjusted than noun meanings; (b) the degree of verb meaning adjustment-but not noun meaning adjustment-increased with semantic strain; and (c) verb meaning extension is primarily driven by online adjustment, although sense selection also plays a role. In Experiment 3, we replicated the word2vec results with an assessment using human subjects. The results further showed that nouns and verbs change meaning in qualitatively different ways, with verbs more likely to change meaning metaphorically and nouns more likely to change meaning taxonomically or metonymically. These findings bear on the origin and processing of verb metaphors and provide a link between online sentence processing and diachronic change over language evolution.

Entities: Chemical

Keywords: Metaphor; Metaphor processing; Semantic change; Vector space models; Verb metaphor; Verb mutability; word2vec

Mesh：

Year: 2022 PMID： 35587112 PMCID： PMC9285493 DOI： 10.1111/cogs.13141

Source DB: PubMed Journal: Cogn Sci ISSN： 0364-0213

Introduction

Metaphoric uses of verbs are frequent in everyday language. We use phrases like surmounting a problem, eating our words, or stumbling on a solution in ordinary conversation. Research in cognitive linguistics has also documented large systems of conventional metaphors that pervade language, and verb metaphors feature prominently among these (Clausner & Croft, 1997; Fauconnier & Turner, 1998; Gibbs, 2006; Lakoff & Johnson, 1980, 2008; see also Steen, 2007). For example, Lakoff and Johnson (2008) list many verb metaphors among the expressions that constitute the TIME IS MONEY metaphoric system: You are wasting my time. This gadget will save you hours. I do not have the time to give you. How do you spend your time these days? That flat tire cost me an hour. I have invested a lot of time in her. Psychological research on metaphor processing has largely focused on noun–noun metaphors of the form An X is a Y (e.g., My job is a jail, my lawyer is a shark; Blank, 1988; Bowdle & Gentner, 2005; Chiappe & Kennedy, 2001; Gentner & Wolff, 1997, 2000; Gibbs, 1992; Giora, 1997; Glucksberg & Keysar, 1990; Glucksberg, McGlone, & Manfredi, 1997; Jones & Estes, 2006; A. N. Katz, 1989; Keysar, Shen, Glucksberg, & Horton, 2000; Ortony, 1979; Shen, 1989; Thibodeau & Durgin, 2011; Tourangeau & Rips, 1991; Trick & Katz, 1986; Tourangeau & Sternberg, 1981, 1982; Wolff & Gentner, 2011). Psycholinguistic research on metaphoric uses of verbs is comparatively rare (but see Cardillo, Schmidt, et al., 2010; Cardillo, Watson, & Chatterjee, 2017; Cardillo, Watson, et al., 2012; Gentner & France, 1988; Stamenković, Ichien, & Holyoak, 2019; Torreano, Cacciari, & Glucksberg, 2005). The dearth of research on verb metaphor is unfortunate, as there is evidence that verb metaphors are more common than noun metaphors (Jamrozik, Sagi, Goldwater, & Gentner, 2013; Krennmayr, 2011). Krennmayr (2011) conducted a corpus analysis over 186,688 words of text spanning multiple registers (news, academic, fictional, and conversational) and found that verb metaphors were more frequent than noun metaphors in all registers. Jamrozik et al. (2013) compared verbs and nouns in terms of what they called metaphoric potential—the likelihood that a word will be used metaphorically. For each word, the researchers randomly sampled 20 sentences from the Corpus of Contemporary American English (Davies, 2009) and asked judges to rate the metaphoricity of the selected word in the sentence. The results showed that, controlling for concreteness and imageability, verb uses were rated as significantly more metaphoric than noun uses.

The verb mutability effect

An early approach to studying verb metaphor in psychology was research on the verb mutability effect in sentence processing (Gentner, 1981; Gentner & France, 1988; Reyna, 1980). Verb mutability refers to the phenomenon whereby, under conditions of semantic strain, the verb is more likely to adapt its meaning to the noun than the reverse. Gentner and France (1988) investigated this effect by having participants paraphrase simple intransitive sentences that varied in semantic strain. They selected eight nouns and eight verbs and combined them factorially to generate 64 sentences (see Fig. 1). The nouns and verbs were selected such that some combinations generated sentences in which the verb received its expected subject type, resulting in semantically unstrained, or literally interpretable, sentences (e.g., The daughter agreed), while other combinations generated sentences in which the noun violated the verb's expected subject type, resulting in semantically strained sentences that were not literally interpretable (e.g., The car agreed).

Fig. 1

Grid showing stimuli noun and verbs from Gentner and France (1988), with some examples of sentences generated from combining them. Shaded cells indicate semantically strained combinations; unshaded cells indicate unstrained combinations. Noun–verb combinations used in Experiment 1 fall within the outlined box. Gentner and France found that when paraphrasing, people altered the verb meanings more than the noun meanings overall and that this effect increased with semantic strain. Thus, while participants generally preserved the standard meaning of both the noun and the verb when interpreting unstrained sentences (e.g., paraphrasing The daughter agreed as The girl concurred), there was a marked preference for changing the meaning of the verb, and not the noun, when interpreting strained sentences (e.g., paraphrasing The car agreed as The automobile was easily controlled). In other words, under conditions of semantic strain, people tended to interpret the verb metaphorically and the noun literally. Further evidence for verb mutability in sentence comprehension comes from research on memory. Work going back decades has demonstrated that verbs are harder to remember than nouns in both free‐recall and recognition tasks (H. H. Clark, 1966; Earles & Kersten, 2000, 2017; Earles, Kersten, Turner, & McMullen, 1999; Horowitz & Prytulak, 1969; Kersten & Earles, 2004). Earles et al. (1999) showed that in free recall tests of verb‐noun pairs (e.g., wave‐hand), participants were less able to recall the original verb than the original noun. Kersten and Earles (2004) tested memory for sentences and found the same pattern for recognition: Verbs were recognized less well than nouns overall. More specifically, they found that verbs were significantly less likely to be recognized when combined with a different noun at test than at encoding (e.g., when given The quarter bounced at encoding and The ball bounced at test). Nouns, however, were recognized equally well at test, regardless of whether the paired verb was the same or different as at encoding (e.g., The quarter bounced at encoding and The quarter rolled at test). Linking their results with Gentner's (1981) verb mutability hypothesis, Kersten and Earles interpreted their findings as evidence that verb encoding is more variable than noun encoding, with the noun providing a stable semantic context to which the verb's meaning is adapted. Verb mutability has also been demonstrated in studies of meaning coercion imposed by syntactic constraints. For example, in Art sneezed the foam off his beer, the normally intransitive verb sneeze acquires a transitive meaning by virtue of appearing in the transitive double‐object construction (Goldberg, 1995). Kaschak and Glenberg (2000) showed that the interpretation of novel denominal verbs (nouns used in a novel way as verbs, see E. V. Clark & Clark, 1979) depends on the syntactic construction used. For example, when given the double‐object construction Lyn crutched Tom her apple to prove her point, participants interpreted the verb to mean that Lyn conveyed her apple to Tom using a crutch. When given the transitive construction Lyn crutched her apple to prove her point to Tom, participants interpreted crutched as meaning simply that Lyn acted upon the apple in some way using the crutch. In either case, however, the verb's meaning is adjusted to the semantic context provided by construction and the surrounding nouns. There is also indirect evidence for verb mutability from historical studies of language change over time (Dubossarsky, Weinshall, & Grossman, 2016; Sagi, 2019). Dubossarsky et al. (2016) compared rates of change for nouns, verbs, and adjectives from 1850 to 2000. They found that verbs changed meaning at a faster rate than both nouns and adjectives over the period of analysis. Dubossarsky et al. suggested that verbs’ greater rate of change over time in language evolution might be driven by their greater mutability in processing, citing Gentner and France's (1988) findings.

Processes underlying verb mutability

Thus, there is evidence from studies of sentence processing, sentence memory, and diachronic meaning change that verbs have a greater propensity for semantic adjustment in context than do nouns. But how does this happen? In general, there are two prominent accounts of how meaning adjustments under semantic strain can take place: sense selection (often called word sense disambiguation) and online adjustment (also called sense creation; e.g., H. H. Clark & Gerrig 1983; Frisson & Pickering, 2007; Gerrig, 1989; Gerrig & Bortfield, 1999; Lenat & Guha, 1989; Pritchard, 2019; Rapp & Gerrig, 1999; Vicente, 2018; Vicente & Falkum, 2017). There is little dispute that people often draw on existing word senses to resolve meaning when the typical literal interpretation of a word is contextually implausible. However, Gentner (1981; Gentner & France, 1988) interpreted their verb mutability findings as indicating that verbs are more likely to undergo online adjustment to their representations than are nouns. They noted that the online adjustment view provides a way to explain novel metaphoric extensions. For example, interpreting The car agreed as The vehicle drove well would seem to require online modification of the verb, as agreed lacks a conventional metaphoric sense that could be accessed from memory and applied to car. The online adjustment view can also potentially explain the relationship between metaphor and language change. Metaphor is widely believed to be an important force in how words change meaning over time, including how words gain new senses (Bowdle & Gentner, 2005; Cardillo, Watson, Schmidt, Kranjec, & Chatterjee, 2012; Chatterjee, 2010; Dirven, 1985; Heine, 1997; Hopper & Traugott, 2003; Jamrozik, McQuire, Cardillo, & Chatterjee, 2016; Joseph, Hock, & Joseph, 1996; Sweetser, 1990; Traugott, 1988; Wolff & Gentner, 2011; Xu, Malt, & Srinivasan, 2017). There is evidence suggesting that many conventional metaphoric senses originated as novel extensions of literal concepts. For example, the heart referred literally to an organ before later gaining metaphoric senses such as the center of things (Dirven, 1985). Similarly, a bridge originally referred only to a structure linking two physical locations but is now frequently used metaphorically to mean anything that links two abstract situations (Zharikov & Gentner, 2002). Thus, online adjustment may be an important driving force behind polysemy. However, before embracing the online adjustment account of verb mutability, we must first address an alternative explanation—namely, selection among existing word senses. There is evidence that, controlling for frequency, verbs are more polysemous than nouns (Gentner, 1981; Miller & Fellbaum, 1991). Thus, it could be that under semantic strain, it is easier on average to find an appropriate word sense for the verb than for the noun. On this account—the sense selection account of verb mutability—meaning adjustment occurs primarily by selecting among preexisting senses rather than by deriving new meaning online. If sense selection is the primary driver of verb mutability, it would suggest that the verb mutability effect in sentence processing is really a verb polysemy effect. Given this concern, we evaluated the polysemy of the stimuli used by Gentner and France (1988) and by Kersten and Earles (2004) by counting the number of senses listed for each word in WordNet (Miller, 1995). An independent‐samples t test showed that in both cases, the verbs used were significantly more polysemous than the nouns (ps < .05), leaving open the possibility that differences in polysemy could explain both studies’ results. Thus, sense selection may be the primary of verb mutability, instead of—or in addition to—online adjustment. In this research, we investigate this question by systematically varying both noun and verb polysemy and semantic strain. As in the Gentner and France's (1988) paradigm, participants paraphrased simple intransitive sentences that varied in semantic strain, which were then evaluated for the degree of noun and verb meaning change that occurred. Unlike Gentner and France, however, we selected nouns and verbs such that half were low‐polysemy (one to two senses) and half were high‐polysemy (seven to 13 senses). Stimuli were generated by combining the nouns and verbs factorially so that across the full set of sentences, every possible combination of low‐ and high‐polysemy nouns and verbs was realized. If sense selection drives verb mutability, we would expect (a) symmetrical patterns of change for nouns and verbs and (b) a significant relationship between the polysemy of a word and the degree of change under strain. Alternatively, if online adjustment is the primary driver of verb mutability, we would expect (a) greater change in verbs than in nouns; (b) greater change in verb meaning as strain increases; and (c) relatively minor effects of polysemy on the degree of change. Before describing the experiments, however, we must confront the issue of how to assess the degree of semantic change. How does one objectively determine the relative degree of change in the noun versus the verb when someone paraphrases, say, The car agreed as The automobile was easily controlled? Gentner and France (1988) approached this issue using three different behavioral measures. All three of them provided evidence for the verb mutability effect; however, each had significant drawbacks. We discuss these methods below.

Behavioral approaches to assessing semantic adjustment

In the divide‐and‐rate method, a group of raters was instructed to divide each paraphrase into the part that came from the noun and the part that came from the verb; they then rated the similarity of each part to the original noun and verb. The results indicated that the part that came from the verb tended to change more than the part that came from the noun. However, this method was time‐consuming and labor‐intensive. Worse, judges often could not agree on how to divide the sentences, resulting in a high amount of data loss. For example, in paraphrasing The car limped as the badly functioning vehicle struggled to drive, the modifier badly functioning and the verb phrase struggled to drive seem to owe their presence to both the original noun and verb, making it unclear how to divide them into noun‐ and verb‐derived components. Gentner and France devised two further methods that did not require dividing the paraphrases into parts: a retrace task and a double‐paraphrase task. In the retrace task, the paraphrases were given to a new group of judges, along with a list of either the initial stimulus nouns or the initial stimulus verbs. For each paraphrase, they indicated which noun or verb they thought had appeared in the original sentence. Participants were more accurate for nouns than for verbs, indicating that the initial noun meanings had changed less than the initial verb meanings. However, this method had the drawback that the lists of initial stimuli were not designed to test for the degree of semantic change in either nouns or verbs. In the double‐paraphrase task, the original paraphrases were given to a new set of participants to paraphrase. The rate at which the initial nouns and verbs resurfaced in the new paraphrases was taken to reflect the degree of meaning adjustment that had occurred: the greater the change in a word's meaning in the initial paraphrase, the less likely the word was to resurface again in the second paraphrase. Consistent with the verb mutability effect, nouns were more likely to resurface than verbs. The double paraphrase task had the advantage of being the most hands‐off approach of the three; however, it too resulted in substantial data loss: only 19% of nouns and 4% of verbs resurfaced in this method. In sum, all three of Gentner and France's methods indicated greater change of meaning in the verb than in the noun. However, none of them was ideal: The divide‐and‐rate and double‐paraphrase techniques were liable to considerable data loss, and the retrace method was limited by the particular word sets chosen initially. Therefore, in the present work, we turn to new techniques for computing relatedness between texts that have since emerged out of work in computer science and computational linguistics: vector space word embedding models (WEMs).

Using vector space WEMs to assess semantic adjustment

In this research, we use word2vec (Mikolov, Chen, Corrado, & Dean, 2013; Mikolov, Sutskever, Chen, Corrado, & Dean, 2013), a vector space WEM, to assess the degree of semantic change between a stimulus word and its paraphrase. WEMs take as their foundation the notion that words are similar or related to the extent that they appear in similar contexts. WEMs are trained on a large corpus and derive a vector representation for each word (typically 100 to 300 dimensions) based on global distributions of co‐occurrence patterns in the corpus (an overview of the free parameter choices involved in training and using word2vec is included in the Supplementary Material). The similarity between any two word meanings is typically calculated by taking the cosine of the angle between their two associated vectors, resulting in a score between −1 and 1. Scores close to 1 are taken to indicate high levels of similarity, and scores close to 0 indicate low levels of similarity. The logic of our approach is to use word2vec's cosine similarity scores to estimate the similarity between the paraphrase and the original verb (or noun) and therefore the degree of change under paraphrase. A high cosine score between a verb or noun and its paraphrase is taken to indicate that the meanings are highly similar, and therefore that the initial word's meaning was not much altered in that paraphrase. By the same logic, a low cosine similarity score is taken to indicate a high degree of semantic change (see details in Experiment 1). We used pretrained word2vec vectors publicly available from Google, which were trained on a 100‐billion word subset of the Google News corpus, resulting in a vocabulary of over 3 million words. We chose word2vec over other WEMs based on a study by Pereira, Gershman, Ritter, and Botvinick (2016), which compared several prominent off‐the‐shelf WEMs, including word2vec, GloVe (Pennington, Socher, & Manning, 2014), and LSA (Landauer & Dumais, 1997). Word2vec and GloVe were the best performing of the test set, providing the highest correlations with human similarity judgments on almost all of the 17 datasets tested. We chose word2vec over GloVe because it is more widely used than any other WEM; its two foundational papers have been cited more than a combined 53,000 times. We used word2vec's set of pretrained vectors rather than training our own version because we wanted a general‐purpose language corpus, not one aimed at verb (or noun) metaphor. Using pretrained vectors also minimizes the opportunity for inadvertently tailoring the space to fit the predicted results and removes the need to make free‐parameter choices. A further advantage is that the results can be more easily replicated and compared.

Sense selection versus online adjustment

Using word2vec, we investigated the question of sense selection versus online adjustment in verb mutability. If mutability is driven chiefly by sense selection, then polysemy should predict mutability: Low‐polysemy nouns and verbs should show little semantic change, while high‐polysemy nouns and verbs should show substantial meaning change under semantic strain. We would further expect that when a high‐polysemy noun is combined with a low‐polysemy verb, the noun—and not the verb—should change meaning. Overall, when controlling for polysemy, there should be little or no difference in meaning change by syntactic class. In contrast, the online adjustment view posits that verb mutability results primarily from online processes that alter the verb's typical representation to fit with the noun's meaning. In this case, we would expect by‐class (rather than by‐polysemy) differences. If online adjustment of verb meaning is the driver of verb mutability, then verbs should change meaning more than nouns regardless of polysemy and with noun meanings stable at both low‐ and high‐polysemy. Of course, both processes may be involved, in which case we should find that high‐polysemy words are more mutable than low‐polysemy words, but that overall, verbs are more mutable than nouns. The plan of this paper is as follows. In Experiment 1, we carried out a partial replication of Gentner and France's (1988) Experiments 1 and 2, but using word2vec to assess meaning change instead of human judges. The goal was to replicate the verb mutability effect and also to test the feasibility of using word2vec to assess semantic change in a sentence processing context. In Experiment 2, we compared the sense‐selection and online adjustment accounts of mutability by testing polysemy as a predictor of meaning change, once again using word2vec to assess degree of change. In Experiment 3, we re‐ran the paraphrases from Experiment 2, using human judges instead of word2vec to assess the degree of semantic change. The idea was to ascertain whether word2vec's results conform to human intuitions.

Experiment 1

In Experiment 1, we sought to replicate the findings of Gentner and France (1988) using word2vec to assess semantic adjustment. As in the original work, we asked participants to paraphrase sentences varying in semantic strain. We then used word2vec to assess the degree of change in the noun and in the verb as described below.

Method

Participants

A total of 121 university undergraduates completed the study in person in the laboratory. They received course credit in an introductory psychology class for their participation. Five were excluded for not being native English speakers, and seven were excluded for failing the catch trial criteria, for a net of 109 participants.

Materials and design

Stimuli consisted of a subset of those used in Gentner and France's Experiments 1 and 2. Gentner and France generated stimulus sentences by combining eight nouns with eight intransitive verbs for a total of 64 different sentences, which can be visualized as forming a matrix (see Fig. 1). The nouns consisted of two humans, two animals, two artifacts, and two abstract nouns. The verbs were matched to the nouns with respect to their preferred subject type. There were two verbs that prefer human subjects, two that prefer animals (or humans), two that prefer artifacts (or animals or humans), and two that prefer abstract nouns (or the other three categories). By arranging the nouns and verbs into a matrix, semantic strain can be varied systematically as shown in Fig. 1. When the noun meets the verb's selectional preference, the result is a literal, unstrained sentence. But when the noun violates the verb's selectional preference, the result is a semantically strained (nonliteral) sentence. For example, agree prefers a human subject, so The daughter agreed is unstrained, but The car agreed is semantically strained. We used Gentner and France's original stimuli with one modification: We excluded the abstract category, leaving six nouns and six verbs for a total of 36 of the original 64 sentences (see Fig. 1). This was done for two reasons. First, it simplified and balanced the design such that each participant received an equal number of strained and unstrained sentences while seeing each stimulus noun and verb exactly once. Second, many of the original sentences involving abstract nouns seemed awkward (e.g., The responsibility succeeded). We were concerned that participants might not be able to provide meaningful interpretations of these sentences, which in turn might bias the results toward greater mutability (i.e., they might result in high numbers of meaningless but nevertheless semantically distant adjustments). Removing the abstract category, therefore, provides a stricter test of the verb mutability effect.

Design

So that each participant saw each noun and verb exactly once, the 36 total stimulus sentences were divided into six different between‐subject item groupings of six sentences each. Each grouping consisted of two strained and four unstrained sentences. Thus, the design was 6 (item grouping, between‐subject) × 2 (item strain: strained vs. unstrained, within‐subject). Each participant saw each of the six nouns and six verbs exactly once. Two simple unstrained sentences were included as catch trials for checking attention and following directions; the criteria for excluding a subject were repeating a noun and/or verb in both of the catch trials or producing an obviously nonsensical answer in either. As each of the 109 net participants paraphrased six initial sentences, there were roughly 18 paraphrases per initial sentence.

Procedure

Each participant was randomly assigned to one of the six item groupings. Participants completed the experiment individually, in person, on a computer. They first read instructions informing them that they would see a number of different sentences and that they should provide a meaningful interpretation of each. They were explicitly instructed not to translate sentences mechanically (word‐by‐word) but rather to think of a plausible overall meaning for the sentence. To illustrate the difference between a mechanical and meaningful paraphrase, they were provided with an example of each. The full instructions can be found in Appendix A. Sentences were presented one at a time in randomized order, and participants typed their responses. Once they had submitted a response for a sentence, they could not go back to previous responses.

Coding

Two human coders, blind to the hypotheses, were used to exclude certain types of paraphrases from the analysis: blatantly noncompliant responses (e.g., paraphrasing the daughter cooked as the child) and responses that did not constitute a meaningful interpretation of the sentence. Two types of interpretations met this second criterion: (a) responses that described the context suggested by the initial sentence rather than actually interpreting it (e.g., paraphrasing The mule shivered as It was a cold night) and (b) mechanical, word‐by‐word paraphrases of strained sentences (e.g., paraphrasing The lantern worshipped as The candle honored). As noted above, participants were explicitly instructed to try to interpret the intended meaning, not to deal with each word separately. Of course, for unstrained sentences (which are literally interpretable), a meaningful paraphrase is indistinguishable from a word‐by‐word paraphrase (e.g., paraphrasing The daughter worshipped as The girl prayed). Thus, coding for mechanical paraphrases was necessary only for the strained sentences; however, all paraphrases were coded for responses that described the situation and for noncompliant responses. The two coders judged all 654 paraphrases. Each coder was presented with the original sentence and all corresponding paraphrases and indicated whether each paraphrase was meaningful, mechanical, describing the situation, or noncompliant. Coding was done in chunks wherein each judge coded a set of paraphrases independently, followed by a reconciliation session where the judges came to an agreement on any disparities. The judges were able to reach a final consensus on all items. Coding resulted in the exclusion of 128 paraphrases (91 mechanical, 24 describing the situation, and 13 noncompliant), leaving 526 of the original 654 paraphrases for the main analysis. Cohen's κ was run to determine interrater reliability. There was moderate initial agreement between the two judges, κ = 0.66, (95% CI, 0.57 to 0.75), p < .001. A summary of the results of the coding task is shown in Appendix B. After coding, an average of 14.61 paraphrases per item remained.

Assessing semantic adjustment

For each paraphrase, word2vec was used to obtain two cosine similarity scores: a noun score and a verb score, representing the amount of semantic adjustment the initial noun and verb underwent from the original sentence to the paraphrase, respectively. The scoring process was as follows. First, separate normalized vectors were obtained for each stimulus noun and verb. Next, a vector for each paraphrase was generated by averaging its normalized component word vectors. The noun‐change score was then computed by calculating the cosine similarity score between the original noun vector and the paraphrase vector; likewise, the verb‐change score was calculated as the cosine similarity score between the original verb vector and the paraphrase vector. Comparing the initial noun and verb to the entire paraphrase has the advantage of eliminating the need to divide paraphrases into components. For example, to assess the amount of meaning change that occurred for the noun and verb from the stimulus sentence The lantern limped to the paraphrase The candle flickered, the cosine of the angle between vector for lantern (the original noun) and the vector for The candle flickered (the participant paraphrase) was calculated, and likewise for limped (the original verb) and The candle flickered. The resulting noun and verb scores are 0.47 and 0.22, indicating that the verb's meaning changed more than the noun's in this paraphrase (recall that for WEMs, scores closer to 0 indicate a lower degree of similarity between items).

Results

To preview, the results bore out the two key findings necessary for a successful replication of the Gentner and France's (1988) findings: (a) overall, the change in meaning was greater for verbs than for nouns, and (b) this effect was greater for semantically strained sentences than for unstrained sentences. As Fig. 2a shows, verb meanings changed strongly in response to strain, while noun meanings remained stable. Table 1 shows example paraphrases for strained and unstrained sentences.

Fig. 2

Table 1

Example paraphrases from Experiment 1

Condition	Stimulus Sentence	Paraphrase
Unstrained	The daughter cooked	The girl made food
	The politician shivered	The statesman quivered
	The mule limped	The horse walked gingerly
Strained	The car agreed	The vehicle responded well to the driver
	The lantern limped	The candle flickered
	The lizard worshipped	The amphibian laid out in the sun

Noun and verb similarity scores from Experiment 1. Lower scores indicate greater semantic adjustment. Error bars/bands represent 95% confidence intervals. (a) Strain treated as a categorical predictor. (b) Strain as a continuous predictor, derived from the comprehensibility ratings. Example paraphrases from Experiment 1 To test whether verb meanings changed more than noun meanings overall, a difference score for each paraphrase was calculated by subtracting the verb cosine score from the noun cosine score. Since lower word2vec scores indicate less relatedness between items, a positive difference score indicates greater verb change than noun change. Next, a linear mixed‐effect model was fit, with the difference score as the dependent measure, the intercept (representing the mean difference score) as the only fixed effect, and subjects and items as random effects. The intercept was found to be significantly greater than 0, β = 0.11, SE = 0.02, t = 5.35, p < .001, indicating that, on average, verbs (M = 0.26, SD = 0.11) changed their meaning significantly more overall than nouns did (M = 0.38, SD = 0.15). To test the effect of semantic strain on the degree of meaning change, two additional models were fit: one for nouns and one for verbs. In both models, the word2vec score was the dependent measure, strain (unstrained vs. strained) was the fixed effect, and subjects and items were included as random effects. For verbs, the effect of semantic strain was significant, β = −0.26, SE = 0.08, t = 3.09, p < .01, indicating that verb meaning was adjusted to a greater extent in the strained condition (M = 0.21, SE = 0.02) than in the unstrained condition (M = 0.28, SE = 0.01). For nouns, there was no significant effect of semantic strain, β = 0.07, SE = 0.10, t = 0.66, p = .51. These results are shown in Fig. 2a.

Obtaining direct ratings of semantic strain

In the analyses so far, we have followed Gentner and France's original procedure wherein strain was treated as a categorical predictor, with sentences categorized as either strained or unstrained based on whether the verb received its expected noun subject type (represented by the shaded squares in Fig. 1). Although this provides a principled way to classify strained versus unstrained sentences, treating strain as a dichotomous predictor fails to capture the intuition that some sentences are more strained than others (e.g., consider The mule agreed vs. The lantern agreed). To provide a finer‐grained continuous measure, we obtained direct ratings of sentence comprehensibility from a new group of 43 undergraduates. They were asked to rate, on a scale of 1 to 10, how easy or hard they thought it would be for a “typical person” to understand each of the stimulus sentences, with 1 meaning very hard for most to understand and 10 meaning very easy for most to understand. Each participant rated 12 of the 36 target items and four fillers, resulting in 11 ratings for each target item. On the assumption that high comprehensibility corresponds to low strain (and low comprehensibility to high strain), we inverted the scale so that a score of 0 corresponded to the least amount of strain possible, and a score of 9 corresponded to the maximum amount of strain possible. The mean ratings and standard errors for each item are provided in Appendix C. Next, we reanalyzed the data from Experiment 1 using the new continuous measure of strain as the fixed effect. The results replicated the previous findings. There was a significant main effect of semantic strain for verbs, β = −0.38, SE = 0.08, t = 4.89, p < .001, but not for nouns, β = −0.04, SE = 0.11, t = 0.39, p = .70 (see Fig. 2b). Notably, the value of the standardized slope coefficient for verbs obtained using the continuous measure of strain (−0.38) was larger than the parameter obtained in the categorical model (−0.26), suggesting that the continuous measure of strain was indeed more sensitive than the categorical measure. Based on this finding, in the remaining experiments we followed the same procedure of obtaining direct strain ratings of the stimulus items and using the continuous predictor in the analyses.

Discussion

The results of Experiment 1 demonstrate a verb mutability effect, replicating Gentner and France's (1988) original findings. First, verb meanings were found to change significantly more than noun meanings overall. Second, semantic strain predicted verb change but not noun change. In the categorical model, verbs in strained sentences changed more than verbs in unstrained sentences, while noun scores were nearly identical in the two conditions. In the continuous model, the degree of verb change increased linearly with the degree of semantic strain, while noun change remained flat. This shows that, as predicted, verbs changed their meaning more readily than nouns and were the locus of change in resolving semantically strained utterances. Table 1 shows example paraphrases of unstrained and strained sentences from Experiment 1. In addition, the fact that the patterns of meaning change found using word2vec replicate Gentner and France's past results using human judges is encouraging evidence that word2vec is capable of capturing human intuitions regarding semantic adjustment in a sentence processing context. Of course, a more direct comparison between word2vec scores and human judgments is needed—we provide such a test in Experiment 3. Nevertheless, two questions bear addressing before moving on. One concern is whether our results are confounded by a relationship between strain and paraphrase length. It may be that strained sentences require more words to interpret than unstrained sentences (e.g., compare The daughter agreed → The girl concurred vs. The car agreed → The vehicle responded well to the driver). This might artificially depress word2vec scores by making the noun or verb less similar to any single word in the paraphrase. A closer look at paraphrase lengths, however, alleviates this concern. The mean paraphrase length in Experiment 1 was fairly flat across strain; the average paraphrase length was 3.94 for the least‐strained item and 4.25 for the most‐strained item. A mixed effect linear regression confirmed no significant relationship between semantic strain and net paraphrase length (i.e., excluding stop words like the that were not included in the word2vec model), β = −.01, SE = 0.05, t = 0.24, p =.82). In addition, for both nouns and verbs, there was no significant relationship between net paraphrase length and word2vec score. That is, the mean noun cosine similarity score of the longest paraphrases did not differ significantly from that of the shortest paraphrases (β = −0.06, SE = 0.04, t = 1.47, p = .14) and likewise for verbs (β = −0.004, SE = 0.04, t = 0.10, p = .93.). Thus, it does not appear that the observed effects of strain are attributable to paraphrase length. A second concern is whether omitting mechanical paraphrases from the analysis could have distorted the findings. Some of the initial sentences (eight out of 36) had a high proportion of paraphrases that were coded as mechanical and were therefore not included in the analysis. The mean strain rating of these items was higher than the overall mean strain rating (5.74 vs. 3.81), meaning that there were many instances where participants did not produce meaningful interpretations of highly strained items and instead provided a word‐by‐word transcription. This is not entirely surprising; we might expect strained sentences to be more difficult to interpret, and this may lead some participants to give up or to be unable to provide a meaningful paraphrase. However, the loss of data among the high‐strain items is problematic. To address this concern, we reran our analyses on the full dataset—that is, without excluding any mechanical or noncompliant paraphrases. The results were the same: there was a significant main effect of semantic strain for verbs but not for nouns. Further, the word2vec scores for the eight items with high rates of paraphrase exclusion matched the overall pattern. The average cosine similarity score for these items was 0.20 for verbs and 0.32 for nouns, indicating that verbs changed more than nouns even among these items. Thus, the verb mutability effect appears to hold consistently across all items, including those with the highest rates of noncompliant paraphrases. To summarize, in Experiment 1, we replicated Gentner and France's original finding of verb mutability but using word2vec to assess the change of meaning instead of human judges. The results bear out the key phenomena of the verb mutability hypothesis: (a) verbs changed more than nouns, and (b) this effect increased with semantic strain. Further, the fact that our results using word2vec parallel Gentner and France's original findings suggests that word2vec is a feasible method for assessing semantic adjustment under paraphrasing. We are now in a position to bear down on the key question: What are the processes underlying verb mutability? Since the verbs used in Experiment 1 (as in Gentner & France, 1988) were significantly more polysemous than the nouns, the results thus far cannot distinguish between sense selection and online adjustment as accounts of mutability. We next investigate whether the verb mutability will hold for sentences when polysemy is controlled, or whether the pattern of greater verb mutability disappears when verbs and nouns are matched for polysemy.

Experiment 2

To test whether verb meaning change is primarily driven by online adjustment or by sense selection, we followed the same procedure as in Experiment 1 but chose new nouns and verbs such that half were low polysemy (one to two senses) and half were high polysemy (7+ senses; see Fig. 3). Polysemy was evaluated by counting the number of synsets for each word in WordNet (Miller, 1995), excluding any that referred to specific people or places (the WordNet entries for each word are included in the supplementary material). Nouns and verbs were combined factorially to form intransitive sentences that comprised every possible combination of low‐ and high‐polysemy nouns and verbs.

Fig. 3

Stimulus matrix for Experiment 2. Shaded cells indicate combinations that result in strained sentences, following Gentner and France's (1988) approach. Pluses and minuses indicate high or low polysemy, respectively. For example, −/+ indicates a low‐polysemy noun and high‐polysemy verb combination (e.g., the motor suffered), while +/− indicates a high‐polysemy noun and low‐polysemy verb combination (e.g., The box complained). The logic of Experiment 2 is as follows: If mutability is mainly driven by sense selection, then high‐polysemy nouns and verbs will show a greater increase in meaning change than will low‐polysemy nouns and verbs—resulting in a polysemy‐by‐strain interaction. Further, if sense selection is the sole driver of meaning change, then the pattern of meaning change should be similar for nouns and verbs. This pattern would be evidence that the verb mutability effect is driven primarily by differential polysemy. In contrast, if verb online adjustment is the main driver of meaning change, then we should find that the degree of semantic strain predicts meaning change for both low‐ and high‐polysemy verbs but not for nouns. In this case, (a) there will be little if any effect of polysemy and (b) the pattern of meaning change will be different for verbs than for nouns. A total of 262 university undergraduates completed the study in person in the laboratory on a computer. They received course credit in an introductory psychology class for their participation. One participant was excluded for not being a native English speaker, and 11 were excluded for failing catch trial criteria, for a net of 250 participants.

Materials

The six nouns and six verbs were combined to form 36 new intransitive sentences. Half the nouns and verbs were low‐polysemy (N− and V−), and half were high‐polysemy (N+ and V+; see Fig. 3). Thus, across the 36 sentences, the four possible combinations of noun and verb polysemy occurred in equal numbers: nine N+/V+ combinations, nine N−/V−‐ combinations, nine N+/V− combinations, and nine N−/V+ combinations. As in Experiment 1, participants saw each noun and verb exactly once, receiving six target sentences (two strained, four unstrained) comprising an equal number of high‐ and low‐polysemy nouns and verbs (three N−, three V−, three N+, and three V+). The noun and verb categories were modified slightly from the previous experiment: two nouns were human, two were dynamic artifacts (i.e., artifacts that are capable of performing an action) and two were static inanimate (inert) objects. The verb categories varied correspondingly, comprising two verbs that prefer human subjects, two that prefer dynamic artifacts (or humans) and two that accept all three noun categories as subjects. Fitted model plots showing the effect of strain and polysemy on word2vec scores for verbs and nouns in Experiment 2. Strain increases from left to right. Lower word2vec scores indicate greater meaning change. Shaded ribbons indicate 95% confidence bands. Following the same procedure described in Experiment 1, the 36 sentences were given to a separate group of 35 undergraduate raters who rated them for comprehensibility; the scale was then inverted to represent semantic strain (see Appendix C).

Experimental design

The design was 6 (item grouping, between‐subject) × 2 (item strain: strained vs. unstrained, within‐subject) × 2 (polysemy: high vs. low, within‐subject). Two simple unstrained sentences were included as catch trials for checking attention and following directions; the criteria for excluding a subject were repeating a noun and/or verb in both of the catch trials or producing an obviously nonsensical answer in either. As each of the net 250 participants paraphrased six initial sentences, there were roughly 41 paraphrases per initial sentence. The procedure was identical to that of Experiment 1. The instructions to participants were the same, with the exception of a minor adjustment to the example provided to participants (see Appendix A). Using the same coding procedure as in Experiment 1, two coders who were blind to the hypotheses were used to remove mechanical paraphrases, paraphrases describing the situation, and noncompliant paraphrases. Of the 1493 total paraphrases obtained, 276 paraphrases were excluded based on these criteria (107 mechanical, 144 describing the situation, and 25 noncompliant), as well as one additional paraphrase that generated a null vector (containing no words recognized by word2vec), resulting in a net of 1216 paraphrases included in the analysis. Cohen's κ was run to determine interrater reliability. There was moderate agreement between the two judges, κ = 0.63 (95% CI, 0.58 to 0.69), p < .001. A summary of the results of the coding task is shown in Appendix B. After coding, an average of 33.77 paraphrases per item remained. Noun and verb cosine similarity scores were obtained for each paraphrase using the same procedure as in Experiment 1. To test whether verbs changed more than nouns overall, we followed the same procedure as in Experiment 1: For each paraphrase, a difference score was calculated by subtracting the verb cosine score from the noun cosine score and was fit to an intercept‐only linear mixed model, with subjects and items included as random effects. Once again, the intercept was significantly greater than 0, β = 0.04, SE = 0.02, t = 2.62, p = .01, indicating that verbs (M = 0.24, SD = 0.12) changed significantly more overall than nouns (M = 0.28, SD = 0.13). Next, to test the extent to which polysemy and strain predicted semantic adjustment, two additional models were fit: one for nouns and one for verbs. In both models, the word2vec score was the dependent measure, polysemy (high vs. low), semantic strain, and the interaction term were included as fixed effects, and subjects and items were included as random effects. The results are plotted in Fig. 4.

Fig. 4

Fitted model plots showing the effect of strain and polysemy on word2vec scores for verbs and nouns in Experiment 2. Strain increases from left to right. Lower word2vec scores indicate greater meaning change. Shaded ribbons indicate 95% confidence bands.

For verbs, there was a significant main effect of semantic strain such that the degree of verb meaning change increased as strain increased, β = −0.29, SE = 0.08, t = 3.51, p = .001. There was also a significant main effect of polysemy, β = −0.22, SE = 0.08, t = 2.70, p = .01, with high‐polysemy verbs (M = 0.21, SE = 0.01) changing meaning to a greater extent than low‐polysemy verbs (M = 0.26, SE = 0.01). The interaction was not significant, β = −0.02, SE = 0.08, t = 0.28, p = .78. For nouns, a significant main effect of polysemy was found, β = −0.16, SE = 0.06, t = 2.70, p = .01, with high‐polysemy nouns (M = 0.25, SE = 0.01) changing meaning to a greater extent than low‐polysemy nouns (M = 0.30, SE = 0.01). There was no significant effect of semantic strain, β = −0.03, SE = 0.06, t = 0.45, p = .65, and the interaction was not significant, β = 0.05, SE = 0.06, t = 0.85, p = .40. As in Experiment 1, we tested for possible confounds between strain, paraphrase length, and word2vec scores. Once again, there was no significant relationship between semantic strain and net paraphrase length, β = −0.03, SE = 0.04, t = 0.7, p = .49, with the paraphrases of the least‐strained item of roughly equal length (M = 4.30) to those of the highest‐strain item (M = 4.25). As in Experiment 1, there was no significant relationship between net paraphrase length and noun word2vec scores (β = −0.02, SE = 0.03, t = 0.58, p = .56). For verbs, a small but significant relationship was found (β = 0.11, SE = 0.03, t = 4.19, p < .001), such that verb similarity scores increased as paraphrase length increased. Note that this is in the opposite direction from that predicted by the concern discussed earlier (that verb similarity scores would be artificially depressed in longer paraphrases). Augmenting our original models with paraphrase length as a covariate resulted in nearly identical parameter estimates as in the original models. The results of Experiment 2 point toward online adjustment as being the primary driver of verb mutability. Verbs changed more than nouns overall, and the degree of meaning change increased as a function of strain for both low‐ and high‐polysemy verbs. In contrast, nouns showed no effect of strain: Noun meaning change was flat from low‐ to high‐strain contexts across both levels of polysemy. Thus, despite being matched for polysemy, nouns and verbs showed distinct patterns of semantic adjustment, with verbs being the locus of change in resolving semantic strain. This result replicates Experiment 1 and supports the verb mutability effect. We also obtained a main effect of polysemy for both nouns and verbs, indicating that some sense selection was also occurring (though the effect in both cases appears smaller than the effect of strain on verb change). Importantly, however, this effect was orthogonal to both strain and word class: neither nouns nor verbs showed the interaction between polysemy and strain that is predicted by the sense selection view. Low‐polysemy verbs changed at an equal rate as high‐polysemy verbs, and low‐ and high‐polysemy nouns were equally stable in meaning. Thus, sense selection fails to explain the asymmetry in patterns of meaning change observed between nouns and verbs and cannot fully account for the verb mutability effect. Examining the paraphrases revealed three patterns that underscore the importance of online adjustment in driving verb mutability (see Table 2 for examples). First, we found that low‐polysemy verbs changed meaning even in sentences that comprised a high‐polysemy noun paired with a low‐polysemy verb (e.g., The bell complained → The alarm rang annoyingly; seven noun senses, two verb senses). If sense selection were the primary driver of mutability, we would expect unbalanced sentences like these to be most favorable toward noun adjustment and verb meaning preservation.

Table 2

Example paraphrases from Experiment 2

Polysemy
	N	V	Stimulus	Paraphrase
N+V−	7	2	The bell complained	The alarm rang annoyingly
	10	2	The queen dried	The monarch aged
	10	2	The box dried	All of the contents were eaten
N−V+	2	11	The motor suffered	The engine sputtered
	2	13	The tree failed	Someone who is usually reliable did not do their job
	1	13	The professor failed	The lecturer did not get his message across
N−V−	2	2	The tree complained	The trunk creaked
	2	2	The motor paused	The car stalled
	1	2	The professor dried	The lecture became boring
N+V+	10	15	The queen burned	The ruler was enraged
	7	13	The bell failed	The alarm stopped
	10	11	The box suffered	The container was crushed

Example paraphrases from Experiment 2 Second, many verb meaning adjustments resulted in novel metaphoric extensions, regardless of the verb's (or noun's) polysemy (e.g., The box dried → All of the contents were eaten; 10 noun senses, 2 verb senses). The third—and perhaps most striking—pattern was that these novel metaphoric extensions sometimes occurred even when a literal interpretation was available (i.e., when the sentence was unstrained) and even when the verb was low polysemy (and the noun was high polysemy). For example, some paraphrases of The queen dried (10 noun senses, two verb senses) included The monarch aged, The monarch died, The monarch lost power, and The monarch grew cold and passionless. Thus, even when conditions were most favorable toward noun change (e.g., low‐polysemy verbs paired with high‐polysemy nouns) or little change at all (unstrained sentences), verbs displayed a remarkable propensity for online adjustments to their meaning. As in Experiment 1, there were some items with high rates of noncompliant paraphrases, although fewer than previously (five out of 36 items had greater than one‐third of the paraphrases discarded, compared to 8/36 in Experiment 1). To test whether this influenced the results, we reran the analyses on the full dataset, including all noncompliant paraphrases (1491 paraphrases, i.e., 1493, less two paraphrases that generated null vectors). The results were the same: we found a significant main effect of semantic strain for verbs but not for nouns and a significant main effect of polysemy for both nouns and verbs (and no interaction). Second, we confirmed that the word2vec scores for the five items with high rates of paraphrase exclusion matched the overall pattern. The mean cosine similarity scores were 0.19 and 0.17 for low‐ and high‐polysemy verbs and 0.30 and 0.23 for low‐ and high‐polysemy nouns. Thus, the pattern of results for items with high rates of discarded paraphrases matched the overall pattern of results in the data.

Comparing the word2vec results with human judgments

Experiments 1 and 2 paint a consistent picture of greater mutability for verbs compared to nouns. But does this effect match human cognition? Our analyses have assumed that word2vec cosine similarity scores capture the degree of meaning adjustment that the noun and verb underwent when paraphrased. That our findings in Experiment 1 replicated Gentner and France's original results grant us some confidence in this assumption. Still, given the novelty of our method, it is important to compare these results with human assessments of the degree of meaning change. This replication would have the further benefit of addressing possible shortcomings of word2vec (and WEMs in general) that have been identified in the literature. For example, although word2vec and other WEMs have been shown to match human similarity judgments well in some tasks (e.g., Günther, Dudschig, & Kaup, 2016; Landauer & Dumais, 1997; Landauer, Foltz, & Laham, 1998; Pereira et al., 2016), there are concerns as to their ability to distinguish similarity from association (Hill, Reichart, & Korhonen, 2015; Lenci, 2018; Pereira et al., 2016; Simmons & Estes, 2006). There are also concerns related to polysemy—for example, Gerz, Vulić, Hill, Reichart, and Korhonen (2016) found that WEM correlations with human similarity judgments were lower for high‐polysemy verbs than low‐polysemy verbs. Although they did not test nouns, it is plausible that the same pattern applies. Therefore, to address these concerns, in Experiment 3, we sought to replicate the results of Experiment 2 using a behavioral assessment of meaning change: the double‐paraphrase task developed by Gentner and France (1988).

Experiment 3

As described in the Introduction, Gentner and France (1988) used three different behavioral approaches to assess the degree to which nouns and verbs changed meaning under paraphrase: divide‐and‐rate, retrace, and double paraphrase. All three provided converging evidence for the verb mutability effect, but they were also labor‐intensive and prone to high amounts of data loss. Of the three, the double‐paraphrase task is most appealing for our present purpose because it is the most hands‐off approach. No judges are needed to divide the paraphrase into component pieces (as in the divide‐and‐rate method), nor is it necessary to ask raters to match each paraphrase with a fixed list of the initial nouns or verbs (as in the retrace task). Further, the strict criterion of requiring an exact match between the initial noun or verb and its appearance in the paraphrase eliminates subjective judgments about the degree of change. In the double‐paraphrase task, the original paraphrases are given to a new set of participants for them to paraphrase—that is, to produce a “double” paraphrase. The double paraphrase is then scored for noun and verb resurfacings. A resurfacing occurs when the original stimulus noun or verb reappears in the double paraphrase. The assumption is that words whose meaning has been preserved in the original paraphrase will be most likely to resurface in the double paraphrase, as in the following example: Here, the stimulus noun motor from Experiment 2 has resurfaced in the double paraphrase, while the verb complained has not. This matches intuition: engine is very similar to motor, while functioned badly represents a much greater adjustment to the meaning of complained. The strict criterion of an exact match (although we accepted differences in pluralization or tense) provides an objective scoring procedure. The tradeoff is data loss, since many near‐matches are discarded—for example, The oak was on fire would not count as a resurfacing for The tree burned for either the noun or the verb. For our present purposes, however, we wished to use unambiguous criteria to serve as a benchmark for the word2vec results from the previous experiment. Seventy‐seven participants completed the study online via Mechanical Turk. The task took approximately 15 minutes, and they were paid at a rate equivalent to Illinois’ minimum wage at the time of the study. Four participants were excluded for failing the catch trial criteria, and two were excluded due to experimenter error, resulting in a net of 71 participants. The 1216 participant paraphrases from Experiment 2 served as the stimuli for Experiment 3. Participants in Experiment 3 received the same instructions as participants in Experiments 1 and 2, with the addition of a sentence instructing them to use their best guess as to the meaning of any misspelled words in the sentences and to ignore any typos to the best of their ability (see Appendix A). For brevity and clarity, in what follows, we refer to the first set of paraphrases obtained in Experiment 2 (which serve as the stimuli/initial sentences in this experiment) as singles and the responses generated in the present experiment (the paraphrases of those singles) as doubles. Singles were grouped into two between‐subject item groupings based on their initial stimulus sentence in Experiment 2. These item groupings were organized so that each participant paraphrased 18 singles, as well as two catch trials that served as attention checks. The 18 singles were presented in three blocks of six items each, with order randomized within each block. Within each block, each of the original six stimulus nouns and verbs (from which the single paraphrase originated) was represented exactly once (so that each occurred three times total for each participant). This blocked design ensured that participants did not paraphrase singles coming from the same original noun or verb consecutively. In addition, because removing mechanical and noncompliant paraphrases in Experiment 2 resulted in an uneven number of singles per original stimulus item, “dummy” singles were included to ensure a uniform experience across participants within each assignment condition. The goal was to obtain doubles of as many of the 1216 singles from Experiment 2 as possible while also ensuring that each participant was matched on the criteria described above. This resulted in 1385 items in total: 1158 target items and 227 “dummy” items that were paraphrased by participants but excluded from the analysis. The procedure matched that of Experiments 1 and 2, except that participants paraphrased 18 sentences instead of 6. All of the stimulus items were paraphrases obtained from Experiment 2.

Scoring

Of the original 1158 doubles, 101 were excluded due to dropping six participants for failing the catch trials. Due to experimenter error, an additional 45 doubles were excluded for a net of 1012 included in the analysis. Among the 1012 doubles included in the analysis, the number of doubles obtained per original stimulus item from Experiment 2 (e.g., The motor complained) ranged from 15 to 34, with a mean of 28.11 and a median of 29.5. Paraphrases were then scored for noun and verb resurfacings. A strict criterion was used: only identical resurfacings counted, except for changes in tense or pluralization.

Analysis

Resurfacing counts by class and polysemy are given in Table 3. As expected, overall data loss (paraphrases where neither the verb nor the noun resurfaced) was high: Out of a possible 1012 paraphrases, nouns resurfaced a total of 214 times and verbs resurfaced a total of 104 times.

Table 3

Number of resurfacings (hits) versus nonresurfacings (misses) for nouns and verbs from Experiment 3a

	Verbs				Nouns
Polysemy	Hits	Misses	Total	Hits^b (%)	Hits	Misses	Total	Hits^b (%)
Low	68	422	490	13.88	124	389	513	24.17
High	36	486	522	6.90	90	409	499	18.04
Total	104	908	1012	10.28	214	798	1012	21.15

Note. aThese numbers include 27 instances in which both the noun and verb resurfaced. bPercentages do not sum to the number in the Total row due to uneven cell counts (see Sections 4.1.2 and 4.2.1).

Number of resurfacings (hits) versus nonresurfacings (misses) for nouns and verbs from Experiment 3a Note. aThese numbers include 27 instances in which both the noun and verb resurfaced. bPercentages do not sum to the number in the Total row due to uneven cell counts (see Sections 4.1.2 and 4.2.1). To test whether the overall difference in noun and verb resurfacings was significant, a difference score for each paraphrase was calculated in the following way: If the noun resurfaced but not the verb, it was scored as a 1. If the verb resurfaced but not the noun, it was scored as a 0. If neither or both resurfaced, it was considered a tie, and that response was excluded. There were 27 instances where both the noun and verb resurfaced. Next, a mixed effect logistic regression model was fit, with difference score as the dependent measure, the intercept as the only fixed effect, and subjects and items as random effects. The intercept differed significantly from 0, β = 1.24, SE = 0.33, 95% CI [0.67, 1.97], z = 3.79, p < .001, indicating that noun‐only resurfacings (187 occurrences) were 78% more likely to occur overall than verb‐only resurfacings (77 occurrences). Next, to test the effect of semantic strain and polysemy on verb and noun resurfacings, two additional mixed effect logistic regression models were fit: one for nouns and one for verbs. Noun/verb resurfacings were the dependent measures in their respective models, with polysemy (high vs. low), strain, and the interaction term included as fixed effects and subjects and items as random effects. The fitted model results are plotted in Fig. 5.

Fig. 5

Fitted models showing the probability of resurfacing for verbs and nouns in Experiment 3. Lower probabilities indicate greater meaning change. Strain increases from left to right. Shaded ribbons indicate 95% confidence bands. For verbs, there was a significant main effect of semantic strain, β = −0.25, SE = 0.11, 95% CI [−0.50, −0.04], z = 2.31, p = .02, indicating that verbs resurfaced less often as strain increased. There was also a significant main effect of polysemy, β = −0.63, SE = 0.23, 95% CI [0.21, 1.17], z = 2.69, p < .01, indicating that low‐polysemy verbs (68 resurfacings) were more likely to resurface than high‐polysemy verbs (36 resurfacings). The interaction was not significant, β = 0.08, SE = 0.11, 95% CI [−0.15, 0.35], z = 0.76, p = .45. For nouns, there was no significant effect of semantic strain, β = −0.02, SE = 0.05, 95% CI [−0.12, 0.07], z = 0.40, p = .69. There was a marginal main effect of polysemy, β = −0.20, SE = 0.10, 95% CI [−0.01, 0.41], z = 1.89, p = .06. The interaction was not significant, β = 0.01, SE = 0.05, 95% CI [−0.09, 0.10], z = 0.12, p = .90. A full replication of Experiments 1 and 2 required the following three results: (a) verbs should resurface less often than nouns overall (indicating greater meaning change overall), (b) resurfacings should decrease with semantic strain for verbs but not for nouns, and (c) high‐polysemy nouns and verbs will resurface less often than low‐polysemy nouns and verbs across all levels of strain. The results of the double‐paraphrase task support all three predictions. As was found in Experiments 1 and 2, (a) verbs changed more than nouns overall (they resurfaced less); (b) semantic strain significantly predicted verb—but not noun—change; and (c) high‐polysemy nouns and verbs changed more (resurfaced less often) than low‐polysemy nouns and verbs (though the effect was marginal for nouns, p = .06). These results parallel the word2vec results in Experiments 1 and 2, providing support for its use in assessing the degree of meaning change in our paraphrase task. To be clear, we are not suggesting that word2vec's embeddings match human representations of word meaning, nor that calculating cosine similarity scores serve as a model of the human comparison process. Nonetheless, the word2vec scores here appear to capture human patterns in the present task—including the effects of polysemy—rather effectively.

Qualitative differences in noun and verb change

Experiments 1–3 show that verb change and noun change differ quantitatively in the degree of meaning change each is prone to undergo. Another important question is whether verb and noun meaning change differ qualitatively as well. That is, in addition to changing more than nouns, do verbs also differ in how they typically change compared to nouns? Thus far in this paper, we have focused mainly on metaphor as the primary way by which verbs extend their meanings. But words can change meaning in many other ways as well, such as through synonymous substitutions (e.g., motor → engine; burn → combust), taxonomic substitutions (e.g., motor → machine; burn → change), or metonymic substitutions (e.g., motor → car; burn → turned into ash). We ask whether verbs’ greater mutability compared to nouns correlates with distinct qualitative patterns of meaning change as well. We expect that verbs will have a greater propensity for metaphoric/analogical extension than nouns. As discussed earlier, metaphoric uses of verbs appear to be significantly more common in day‐to‐day language than metaphoric uses of nouns (Jamrozik et al., 2013; Krennmayr, 2011). A second expectation is that nouns will be more likely than verbs to be paraphrased with a taxonomic substitution–either a more general term (as in car → vehicle) or a more specific one (as in car → Jeep). Intuitively, a taxonomic paraphrase is a way to preserve the likely referent of the original noun while using new content words. Further, taxonomic substitutions may be more available for nouns than for verbs; a number of studies have found that noun concepts are taxonomically structured to a greater extent than verb concepts (e.g., Burnett & Gentner, 2000; Fellbaum, 1999; Graesser, Hopkinson, & Schmid, 1987; Huttenlocher & Lui, 1979; Miller & Fellbaum, 1991; Pavličić & Markman, 1997; Qiu, Castro, & Johns, 2021). For example, Graesser et al. (1987) found that participants in a free‐sort task consistently categorized nouns—but not verbs—in a way that correlated with the pattern shown in a separate taxonomic organization task. That is, participants spontaneously organized nouns—but not verbs—taxonomically. Further, there is evidence that people sometimes produce “chain reversals” for verbs—for example, saying both that drinking is a kind of swallowing and swallowing is a kind of drinking, or that thinking is a type of reasoning and reasoning is a type of thinking (Burnett & Gentner, 2000; Rips & Conrad, 1989). Burnett and Gentner (2000) found that this occurred more often for verbs than for nouns—again suggesting that nouns are organized into stable taxonomies to a greater extent than are verbs. Finally, a third expectation was that nouns would be more prone to metonymic extensions than would verbs. Metonymy is a well‐established aspect of noun usage (e.g., Nunberg, 1995; Pustejovsky, 1995), and metonymic relationships are widespread among nouns, both as lexicalized senses (e.g., a container‐contained relation, as in I ate the whole box) and as novel meaning extensions (e.g., saying the ham sandwich over there to refer to a customer at a diner; Nunberg, 1979). In contrast, the set of verbs that are frequently used metonymically (e.g., begin, enjoy) appears relatively small (Utt, Lenci, Padó, & Zarcone, 2013). Verb metonymy typically manifests as one part of an event standing for the event as a whole. For example, in the writer began the novel, the verb began stands for the event began to write (Nunberg, 1995). That nouns and verbs appear to differ in their relative predispositions toward metaphoric, metonymic, and taxonomic organization raises the possibility that these differences might show up at the level of online sentence processing. To investigate this question, we gave a randomly chosen subset of the paraphrases from Experiment 2 (16 paraphrases from each item, for a total of 576 of the original 1216 paraphrases) to two coders who were blind to the hypotheses. The coders were graduate students in linguistics and were paid for their time. For each paraphrase, the judges categorized the type of change the original noun and verb underwent into seven different types: synonym/highly similar, taxonomic, contextual taxonomic, associative (metonymic), metaphoric (analogous), describes the situation, and other (see Table 4). Cohen's κ was run to determine interrater reliability. There was moderate initial agreement between the two judges, κ = 0.58, (95% CI, 0.55 to 0.61), p < .001; after discussion, consensus was reached on all items.

Table 4

Code	Definition (Summarized)	Noun Example	Verb Example
Synonym/highly similar	A synonym or highly similar term in a literal sense	The dad yelled → The father shouted	The dog barked → The canine growled
Taxonomic high	A superordinate term	The car drove → The vehicle moved	The car drove → The vehicle moved
Taxonomic low	A subordinate term	The person walked → The man sauntered	The person walked → The man sauntered
Contextual taxonomic high/low	A superordinate or subordinate term that is so only in the context established by the sentence	The barrier melted → The iceberg liquified	The radio worked → The receiver received the signal
Associative (metonymic)	A term that is associated, rather than similar or taxonomically related (e.g., part‐whole) and does not share an abstract commonality	The engine functioned → The car worked	The dog growled → The canine trembled
Metaphoric (analogous)	A term involving an analogy or abstract commonality with the original word	The school was full→ The prison was at capacity	The car limped → The vehicle drove slowly
Describes the situation	A term that describes the surrounding context instead of providing a paraphrase	The eggs sizzled → Breakfast is ready
Other/uninterpretable	Uninterpretable or not fitting into any of the above categories	No example was provided to the coders

Codes used in the qualitative analysis. The definitions here are summaries from longer explanations given to the coders; examples are drawn from a larger set that was given to the coders. Coders received an equal number of noun and verb examples for each code Associative (metonymic) Metaphoric (analogous) The tallies for all code categories are given in Appendix D. In what follows, we focus on our three codes of primary interest: metaphoric/analogous, associative/metonymic, and taxonomic (these were also the most common codes, with the exception of Synonym/Highly similar). Fig. 6a shows the overall code tallies for nouns and verbs. As expected, verbs often changed metaphorically (165 occurrences), while nouns did not (27 occurrences). Also as expected, taxonomic substitutions occurred more often for nouns (251 occurrences) than for verbs (101 occurrences), as did associative substitutions (146 for nouns, 110 for verbs).

Fig. 6

Tallies for the metaphoric (analogous), associative (metonymic), and taxonomic categories for nouns and verbs from the qualitative analysis. (a) Total counts. (b) Tallies by strain quartile, with strain increasing from left to right. (c) Tallies by word2vec quartile. The x‐axes are reversed so that change increases from left to right, with Quartile 4 representing the least degree of change (highest word2vec scores) and Quartile 1 representing the greatest degree of change (lowest word2vec scores). Fig. 6b plots the distribution of codes for nouns and verbs by strain quartile (from participants’ ratings in Experiment 2) For verbs, rates of metaphoric responding increased steadily as strain increased, confirming that, as verbs changed meaning in response to strain, they did so mainly via metaphoric extensions. For nouns, however, there were no clear trends across strain associated with most codes, consistent with the idea that verbs were the locus of change. As expected, associative and taxonomic substitutions were more common for nouns, while rates of metaphoric responding were consistently low. Fig. 6c shows the distribution of codes by word2vec quartiles, where Quartile 4 represents the paraphrases where the noun or verb changed the least (i.e., had the highest word2vec similarity score), and Quartile 1 represents those paraphrases where they changed the most (here, the x‐axis has been reversed so that the degree of change increases from left to right, matching the direction of increasing semantic strain in Fig. 6b). For verbs, a clear relationship between degree of meaning change (word2vec quartile) and frequency of metaphoric responding can be seen. The further a verb's meaning changed, the more likely that change was to be a metaphoric extension. The pattern was quite different for nouns. For nouns, few metaphoric substitutions were associated with a meaning change of any degree. Instead, across all degrees of meaning change (i.e., across all word2vec quartiles), participants mostly made taxonomic substitutions, with associative substitutions next most likely. These results support a novel conclusion: In addition to quantitative differences in meaning change, there are also qualitative differences in how nouns and verbs change meaning. When verbs adapt their meanings to context, they mainly do so via metaphor. When nouns adapt their meanings, they do so via taxonomic or associative (metonymic) relations. Thus, in addition to their greater mutability, verbs also appear to be more amenable to metaphoric extensions than nouns.

General discussion

There are three main findings. First, we obtained strong and consistent evidence for the verb mutability effect. Second, we found that online adjustment is the primary driver of verb mutability. Third, we identified qualitative differences in how nouns and verbs change meaning. Also, on a methodological level, we found that word2vec's cosine similarity scores for the original words and their paraphrases aligned well with human judgments of the degree of semantic change. We next review these findings.

Verbs change more than nouns

All three studies provided clear evidence for the verb mutability effect: under semantic strain, verb meanings are altered more than noun meanings. In Experiment 1, we replicated Gentner and France's (1988) original verb mutability findings using a subset of their stimuli. We asked people to paraphrase simple The noun verbed sentences that varied in semantic strain. The results showed (a) that verbs changed more than nouns overall and (b) that the degree of verb meaning change increased with the degree of strain. In contrast, noun meanings remained stable across strain. In Experiment 2, we replicated these findings while systematically varying noun and verb polysemy. In Experiment 3, we replicated our word2vec findings from Experiment 2 using a behavioral assessment of meaning change (the double‐paraphrase task) rather than word2vec scores as in the prior studies. Thus, the verb mutability effect held across different sets of stimuli, different levels of noun and verb polysemy, and different methods of assessing semantic change. When a sentence requires a novel interpretation, it is the verb that alters its meaning.

Online adjustment drives verb mutability

In Experiment 2, we tested whether differential polysemy could explain the greater mutability of verbs. If meaning change occurs largely through selecting an appropriate sense of the verb (or noun), then more polysemous words should show greater meaning change under strain. To test this, we created a new set of sentences that systematically varied the polysemy of the nouns and verbs while independently varying semantic strain. Not surprisingly, there was a main effect of polysemy for both nouns and verbs, indicating that some sense selection occurred. Importantly, however, we did not obtain the interaction between polysemy and strain that would be expected if sense selection were the primary driver of mutability. Instead, both low‐ and high‐polysemy verbs showed greater change of meaning as the strain increased, and both low‐ and high‐polysemy nouns remained equally stable (Fig. 4). Thus, the effect of polysemy was orthogonal to that of semantic strain and cannot explain the asymmetry between nouns and verbs. Further, we observed instances in which people generated novel metaphoric extensions for verbs even when conditions were favorable to greater sense selection in nouns than in verbs—for example, when a low‐polysemy verb was paired with a high‐polysemy noun (e.g., The bell complained → The alarm rang annoyingly). Strikingly, this sometimes happened even when a literal interpretation was available (e.g., The box dried → All of the contents were eaten). In sum, selection from among existing word senses cannot explain the verb mutability pattern (greater change in verb meaning than in noun meaning and greater change in verb meaning as strain increases). We are left with the conclusion that online adjustment is the primary driver of verb mutability. In short, verbs appear remarkably willing to extend their meanings in a way that nouns are not. Indeed, it may be that verbs’ greater mutability is what leads to their relatively high polysemy.

Qualitative differences in noun and verb change

Our third main finding was that verbs and nouns differ qualitatively in how they change meaning. To our knowledge, no prior work has looked at this question. Coding a subset of the paraphrases from Experiment 2, we found that verbs were more likely to extend their meanings metaphorically/analogically than were nouns overall. Noun change was more likely to be via taxonomic substitution or metonymic association; metaphoric extension was rare for nouns. Further, the rate of verb metaphoric extension increased sharply with the degree of strain. In contrast, the rates of all types of noun substitutions (including taxonomic and metonymic substitutions) were largely flat across strain.

Characterizing verb meaning change

In examining the paraphrases from these studies, we observed another important pattern in meaning change—in this case, among the verbs themselves. Across paraphrases, verb meaning change tended to follow two principles. First, verbs typically changed only as far as was required to resolve the semantic strain. Second, verbs changed in such a way that domain‐specific meaning components were adjusted before more abstract relational ones. For example, consider the set of paraphrases below for the verb complained from Experiment 2. In this example, strain increases with the degree of semantic mismatch between noun and verb as one moves from (1) to (3). Sentence (1) is unstrained since the verb receives its preferred (human) subject type. Sentence (2) is moderately strained in that, although bells are inanimate artifacts, they are saliently associated with making a sound. Sentence (3) is highly strained; boxes are inanimate and also not known for making a sound. As the paraphrases show, the degree of verb change increases progressively with strain. The paraphrase of (1)—which is unstrained and literally interpretable—largely retains the standard meaning of complain. In the paraphrase of (2), the domain‐specific components of complain’s meaning have been adjusted from referring to human verbal communication to a more general meaning involving producing an (annoying) sound. In the paraphrase of (3), the verb is abstracted further so that the meaning components having to do with sound are discarded entirely; only the abstract relational notion that complaining indicates a bad state of affairs is retained. Thus, verb meaning change is gradual rather than radical. This pattern of progressive meaning change in verbs was first identified by Gentner and France (1988), who termed it minimal subtraction. Recent work in cognitive neuroscience looking at verb processing has found activation patterns that are consistent with this pattern. A number of studies have found that cortical activation shifts anteriorly from primary perceptual processing areas when a verb is used literally to adjacent secondary areas when it is used figuratively (Cardillo et al., 2012; Chatterjee, 2008; Chen, Widick, & Chatterjee, 2008; Desai, Binder, Conant, Mano, & Seidenberg, 2011, 2013; Jamrozik et al., 2016; Raposo, Moss, Stamatakis, & Tyler, 2009; Saygin, McCullough, Alac, & Emmorey, 2010; Wallentin, Ostergaard, Lund, Ostergaard, & Roepstorff, 2005). These adjacent anterior areas are associated with the processing of abstract concepts (Cardillo et al., 2012; Chatterjee, 2008). Thus, our finding that domain‐specific meaning components (i.e., sensorimotor components) are retained when a verb is used literally but are abstracted away when a verb is used metaphorically parallels imaging studies showing similar shifts from sensorimotor areas to adjacent areas associated with abstract processing. These findings also bear on the question of personification—an area of debate among linguists. As Dorst (2011) describes, at one level, any instance in which the noun violates the verb's selectional preferences can be considered personification—that is, as an invitation to construe the noun as animate/human. This account appears to stand in contrast to our argument here that the verb, rather than the noun, is what is reconstrued. But, Dorst also notes that the interpretation of such violations varies according to the field of study and the purpose of the analysis. Our analysis focused on the semantic‐conceptual level—that is, on how people interpreted the words in strained sentences. In this analysis, we found that, although there were a few instances in which an inanimate noun was paraphrased as an animate being (e.g., The motor complained → The talkative Tracy was on her usual rant), in the great majority of the paraphrases, the noun largely retained its usual meaning, and the verb adapted its meaning to fit the noun's meaning (e.g., The motor complained → The vehicle was noisy and struggling).

Mutability and meaning change over time

Our findings also connect to work on language evolution. There is evidence that verbs change their meanings at a greater rate over time than nouns do (Dubossarsky et al., 2016; Sagi, 2019). For example, Dubossarsky et al. (2016) compared rates of change for nouns, verbs, and adjectives from 1850 to 2000. They found that verbs changed meaning at a higher rate than both nouns and adjectives over the entire period of analysis. Dubossarsky et al. linked their results with the verb mutability effect: The verb mutability effect identified by Gentner (1981) may be one kind of synchronic interpretative bias implicated in the diachronic asymmetry observed in the present article: In terms of synchronic processing, verbs are more semantically mutable than nouns; correspondingly, in terms of diachronic change over time, verbs undergo more semantic change than nouns (p. 20). An important question is the extent to which these diachronic meaning changes are due to metaphoric extensions of verb meaning. There is widespread agreement among both psychologists (e.g., Bowdle & Gentner, 2005; Gentner & Asmuth, 2017; Gentner & Wolff, 2000; Xu et al., 2017) and linguists (e.g., Heine, 1997; Hopper & Traugott, 2003; Joseph et al., 1996; Narrog & Heine, 2021; Sweetser, 1990; Traugott, 1988) that metaphor is an important vehicle for language change over time. For example, in a computational historical analysis examining 5000 metaphorical mappings spanning 1100 years, Xu et al. (2017) found that new word senses most frequently emerged from metaphorical mappings originating from concrete source domains to more abstract domains. For example, the cognitive sense of reflect emerged from a metaphorical mapping from light to thought. Our finding that the verb mutability effect is driven primarily by online adjustment and that verbs have a higher propensity for metaphoric extensions than nouns suggests an intriguing link between verb mutability, online metaphoric extensions, and meaning change over time.

Why do verbs change more than nouns?

Our findings here invite an explanation of why verbs undergo more online change than do nouns. We next consider factors that may drive verb mutability.

Syntactic influences: Word order

The simplest account is that the SVO word order typical of English (and the SV order of our stimuli) establishes the primacy of the subject noun as the context to which the verb must adapt. Although this is plausible to a certain extent, prior work has shown that word order cannot account for verb mutability on its own. Gentner and France (1988, Experiment 2) found a greater semantic change in verbs than in nouns even when the verb was the first word in the sentence (e.g., Worshipped was what the lizard did). Thus, word order alone is unlikely to be a major driver of verb mutability.

Pragmatic influences: Predicate role

Another possible factor underlying verb mutability lies in the pragmatics of sentence interpretation—specifically, the fact that verbs typically serve the in the predicate role in a sentence. As Gentner and France (1988, p. 372) suggested, “… verbs have the job of conveying relations or events that apply to the referents established by the nouns.” More generally, Croft (1993) observed that sentence elements that depend on another element for their meaning (like verbs and adjectives) are the ones that typically change meaning in figurative statements, while the autonomous elements they depend on (often nouns) establish the domain to which they must adapt. As support for the claim that occupying the predicate position contributes to mutability, Gentner and France noted that this pattern holds even for within‐class constructions, such as noun–noun metaphors. For example, in That surgeon is a butcher, the noun in the predicate position (butcher) is the one interpreted metaphorically, yielding a sloppy, brutish surgeon. In contrast, the reverse metaphor, That butcher is a surgeon, suggests a deft, precise butcher. As another example, in noun–noun conceptual combination, the predicate noun typically adapts its meaning to the referent noun (Murphy, 1990; Wisniewski, 1997). Thus, an acrobat hippopotamus is an agile hippopotamus, while a hippopotamus acrobat is a clumsy acrobat. In both these examples, the meaning of the referent term is held constant, while the predicate term is adapted to provide information about the referent. Thus, we suggest that verb mutability is partly driven by the verb's role as a predicate in a sentence.

Semantic influences: Relationality of meaning

Another potential contributor to verb mutability is relationality of meaning. It has been argued that relationality is a key feature of verb meaning; that is, while nouns often refer to objects or object concepts, verbs typically express relations among those referents (Baker et al., 1998; Croft, 2000, 2001; Fillmore, 1971; Jackendoff, 1983; Langacker, 1987, 2008; Levin, 1993; Talmy, 1975, 1988, 2000; Vigliocco, Vinson, Druks, Barber, & Cappa, 2011). We suggest that relationality imposes additional pressure to adjust meaning over and above the pragmatic function of predication (although, as discussed below, the two factors normally work in tandem). One way to test the importance of relationality of meaning per se is to compare the mutability of two words from the same syntactic class that differ in relationality. Asmuth and Gentner (2017) conducted such a test by comparing the mutability of relational nouns and entity nouns. As mentioned earlier, entity nouns are nouns whose referents share common intrinsic properties (as well as common relational structure)—e.g., tiger, apple. Relational nouns are nouns whose referents share a common relational pattern but not common intrinsic properties—for example, carnivore, barrier (Asmuth & Gentner, 2017; Gentner & Asmuth, 2017; Gentner & Kurtz, 2005; Goldwater & Markman, 2011; Goldwater, Markman, & Stilwell, 2011; Markman & Stilwell, 2001; Rehder & Ross, 2001). Emulating Kersten and Earles’ (2004) recognition paradigm, Asmuth and Gentner (2017) gave participants phrases consisting of an entity noun and a relational noun—for example, truck limitation. At a later surprise recognition test, recognition sensitivity was higher for the entity noun (truck) than for the relational noun (limitation). More tellingly, recognition of relational nouns suffered when they were paired with a new entity noun at test (e.g., book limitation)—but this decrement was not found for entity nouns, which were recognized equally well with a new relational noun (e.g., truck threat) as with the original relational noun. Thus, the relational nouns had adapted their meaning to the entity nouns, but not the reverse. This pattern mirrors Kersten and Earles’ findings for noun–verb sentence memory discussed above. Asmuth and Gentner showed that this effect held regardless of word order (e.g., for both tooth opponent and opponent tooth) and also when controlling for the abstractness of the nouns—evidence for the role of the relationality of meaning in driving mutability, over and above other influences. The idea that semantic factors cut across form‐class distinctions in influencing sentence processing has recently been gaining currency in cognitive neuroscience. In a review of the cognitive neuroscience literature on differences in noun and verb processing, Vigliocco et al. (2011) showed that the key distinctions in processing at the cortical level are not defined by form‐class distinctions between nouns and verbs but rather by the semantics of the concepts they refer to. Recent fMRI work comparing noun and verb processing has shown that when these semantic differences are controlled for (e.g., testing only words that refer to events), nouns and verbs generate similar patterns of cortical activation (Cardillo et al., 2012; Vigliocco et al., 2011, 2006; Vigliocco, Vinson, & Siri, 2005). A study by Cardillo et al. (2012) demonstrated that this pattern holds in metaphor processing as well. They conducted an fMRI study that measured cortical activation when people read either noun metaphors or verb metaphors. Crucially, the noun and verb metaphors were matched semantically such that the verbs used were all denominalized verbs (derived from nouns). For example, for the noun metaphor “her smile was a cat's purr,” the corresponding verb metaphor “the flowers purred in the sunlight” was also tested. Cardillo et al. found no differences in cortical activation between the noun and verb metaphors, suggesting that semantics, rather than syntactic class per se, was the key factor driving metaphor processing. These findings converge with those of Asmuth and Gentner (2017) in pointing to relationality as a major factor driving verb mutability.

Relationality and the predicate role combine to drive verb mutability and online adjustment

Based on the above discussion, we propose that verb mutability is driven by both semantic factors (that verb meanings tend to be relational) and pragmatic factors (that verbs play the predicate role). These factors compound in driving verbs’ greater propensity for online adjustments to their meanings. One specific proposal is that relational concepts like verb meanings have greater interactive potential than object concepts (Gentner, 1981). The idea is that verb representations include relations that take external arguments (e.g., CAUSE(Event(X,Y) Event(Y,Z)), where X, Y, and Z are external participants. Entity noun representations, in contrast, have comparatively few external relations. Verbs’ higher interactive potential means that they are “relatively more subject than nouns to external contextual influences and less constrained by internal influences” (Gentner, 1981, p. 175). Compounding these semantic pressures is the pragmatic pressure exerted by the predicate role, which requires that the verb meaningfully relate to its external noun argument(s). This will often require adjusting one or more of the verb's typical semantic components, as in our studies.

Implications and future work

From the theoretical discussion above, one would expect these findings to generalize to transitive sentences. Gentner and France (1988, Experiments 3a and 3b) found evidence that verbs adjust their meanings to those of their direct objects. For example, a sample paraphrase of Marvin discarded a doctor was “Marvin consulted a different practitioner of medicine ”. However, the generality of this pattern and its relation to polysemy need further investigation. Our findings also lead to the intriguing prediction that nouns and verbs should have different characteristic patterns of word senses. First, the patterns found here suggest that verbs’ greater polysemy than nouns is the result of their greater propensity for online adjustment. Furthermore, there should be qualitative differences between verbs and nouns in their characteristic word senses. Specifically, verbs should have many word senses that are metaphorically/analogically related to the verb's literal meaning. Nouns should have many metonymic word senses and fewer metaphoric senses. We have found preliminary evidence for this prediction (King, Gentner, & Mo, 2021, 2022). If this pattern holds, it will provide another link between synchronic processes of sentence comprehension and diachronic processes of word‐sense formation.

Conclusion

We have shown that verb meanings are more mutable than noun meanings: under semantic strain, verb meanings are altered to a greater degree than noun meanings, with the verb's degree of change increasing as strain increases. We further showed that, although sense selection plays a role for both nouns and verbs, the verb mutability effect is driven chiefly by online adjustment. Further, beyond the difference between nouns and verbs in the degree of meaning changes under strain, we also found qualitative differences in how nouns and verbs change the meaning. Whereas nouns were likely to be paraphrased with a taxonomically or associatively related term, verbs were most likely to be paraphrased metaphorically. These findings bear on the nature and processing of verb metaphors, an important and underexplored aspect of language use. Finally, these results provide a link between synchronic processing and diachronic change over language evolution. Supplementary Material Click here for additional data file.

Stimulus sentence	Original paraphrase	Double paraphrase
(Experiment 2)	(Experiment 2)	(Experiment 3)
The motor complained	The engine did not work well	The motor functioned badly

	Original sentence	Paraphrase
1.	The professor complained	The adult whined
2.	The bell complained	The alarm rang annoyingly
3.	The box complained	The container would not close.

Item	Mg	Mc	D	N	Total	Prop. Excluded
The lizard agreed	3	15	0	0	18	0.83
The lantern agreed	6	9	0	2	17	0.65
The lizard worshipped	7	11	0	1	19	0.63
The mule worshipped	7	11	0	0	18	0.61
The car agreed	8	10	0	0	18	0.56
The lantern worshipped	8	10	0	0	18	0.56
The car worshipped	8	9	0	0	17	0.53
The mule agreed	11	8	0	0	19	0.42
The lantern shivered	14	2	2	0	18	0.22
The lantern limped	15	3	0	1	19	0.21
The car limped	15	1	2	0	18	0.17
The mule cooked	15	0	1	2	18	0.17
The lantern cooked	16	0	1	2	19	0.16
The daughter cooked	15	0	0	2	17	0.12
The car cooked	16	0	1	1	18	0.11
The daughter limped	16	0	2	0	18	0.11
The lantern softened	16	0	1	1	18	0.11
The mule shivered	16	0	2	0	18	0.11
The politician shivered	16	0	2	0	18	0.11
The car shivered	17	2	0	0	19	0.11
The lizard cooked	17	0	2	0	19	0.11
The daughter worshipped	17	0	1	0	18	0.06
The lizard softened	17	0	1	0	18	0.06
The politician agreed	17	0	1	0	18	0.06
The politician cooked	17	0	1	0	18	0.06
The car softened	18	0	1	0	19	0.05
The daughter agreed	18	0	1	0	19	0.05
The daughter shivered	18	0	1	0	19	0.05
The mule softened	18	0	1	0	19	0.05
The politician worshipped	18	0	0	1	19	0.05
The daughter softened	18	0	0	0	18	0
The lizard limped	18	0	0	0	18	0
The lizard shivered	17	0	0	0	17	0
The mule limped	17	0	0	0	17	0
The politician limped	19	0	0	0	19	0
The politician softened	17	0	0	0	17	0
Total	526	91	24	13	654	0.20

Item	Mg	Mc	D	N	Total	Prop. Excluded
The box paused	19	18	4	2	43	0.56
The box complained	20	19	3	0	42	0.52
The tree complained	20	16	3	1	40	0.50
The tree failed	26	12	5	0	43	0.40
The box suffered	25	9	5	0	39	0.36
The bell complained	29	7	6	0	42	0.31
The tree burned	29	0	10	3	42	0.31
The tree paused	29	5	7	1	42	0.31
The box failed	31	8	2	1	42	0.26
The tree dried	30	0	9	1	40	0.25
The professor failed	32	0	10	0	42	0.24
The professor suffered	33	0	10	0	43	0.23
The tree suffered	33	4	5	0	42	0.21
The motor suffered	34	5	2	1	42	0.19
The bell suffered	33	2	3	2	40	0.18
The professor paused	33	0	6	1	40	0.18
The queen failed	33	0	6	1	40	0.18
The box burned	34	0	5	1	40	0.15
The queen burned	34	0	5	1	40	0.15
The motor complained	35	1	4	0	40	0.13
The box dried	37	0	2	3	42	0.12
The queen paused	37	0	5	0	42	0.12
The motor burned	38	0	5	0	43	0.12
The bell failed	36	0	4	0	40	0.10
The queen suffered	38	1	3	0	42	0.10
The professor dried	38	0	1	2	41	0.07
The bell paused	39	0	2	1	42	0.07
The motor dried	39	0	2	1	42	0.07
The queen dried	39	0	2	1	42	0.07
The queen complained	40	0	3	0	43	0.07
The professor complained	40	0	2	0	42	0.05
The motor paused	38	0	1	0	39	0.03
The bell burned	41	0	0	1	42	0.02
The motor failed	41	0	1	0	42	0.02
The professor burned	41	0	1	0	42	0.02
The bell dried	43	0	0	0	43	0
Total	1217	107	144	25	1493	0.18

Note. The totals for Meaningful and Total paraphrases here (1217 and 1493) are different from those included in the final analysis in Experiment 2 (1216 and 1491, respectively) due to two paraphrases generating null vectors in word2vec (i.e., containing no words present in word2vec's dictionary). Of the total 1493 paraphrases generated in Experiment 2, two of them generated null vectors, meaning that only 1491 were included in the analysis. Of those two, one of them was excluded during coding, meaning that the 1217 paraphrases coded as meaningful included one paraphrase that generated a null vector. Thus, only 1216 were included in the analysis.

36 in total

1. Influences of age, performance, and item relatedness on verbatim and gist recall of verb-noun pairs.

Authors: J L Earles; A W Kersten; J M Turner; J McMullen
Journal: J Gen Psychol Date: 1999-01

2. Categorization and metaphor understanding.

Authors: R W Gibbs
Journal: Psychol Rev Date: 1992-07 Impact factor: 8.934

3. The career of metaphor.

Authors: Brian F Bowdle; Dedre Gentner
Journal: Psychol Rev Date: 2005-01 Impact factor: 8.934

4. Categorizing entities by common role.

Authors: Micah B Goldwater; Arthur B Markman
Journal: Psychon Bull Rev Date: 2011-04

5. Evolution of word meanings through metaphorical mapping: Systematicity over the past millennium.

Authors: Yang Xu; Barbara C Malt; Mahesh Srinivasan
Journal: Cogn Psychol Date: 2017-06-08 Impact factor: 3.468

6. Relational categories are more mutable than entity categories.

Authors: Jennifer Asmuth; Dedre Gentner
Journal: Q J Exp Psychol (Hove) Date: 2016-08-30 Impact factor: 2.143

7. Why Are Verbs So Hard to Remember? Effects of Semantic Context on Memory for Verbs and Nouns.

Authors: Julie L Earles; Alan W Kersten
Journal: Cogn Sci Date: 2016-05-23

8. Stimulus needs are a moving target: 240 additional matched literal and metaphorical sentences for testing neural hypotheses about metaphor.

Authors: Eileen R Cardillo; Christine Watson; Anjan Chatterjee
Journal: Behav Res Methods Date: 2017-04

9. Functional-anatomical organization of predicate metaphor processing.

Authors: Evan Chen; Page Widick; Anjan Chatterjee
Journal: Brain Lang Date: 2008-08-09 Impact factor: 2.381

10. Semantic similarity and grammatical class in naming actions.

Authors: Gabriella Vigliocco; David P Vinson; Simona Siri
Journal: Cognition Date: 2005-01