Literature DB >> 34869936

An inclusive, real-world investigation of persuasion in language and verbal behavior.

Vivian P Ta¹, Ryan L Boyd², Sarah Seraj³, Anne Keller¹, Caroline Griffith^1,4, Alexia Loggarakis^1,5, Lael Medema¹.

Abstract

Linguistic features of a message necessarily shape its persuasive appeal. However, studies have largely examined the effect of linguistic features on persuasion in isolation and do not incorporate properties of language that are often involved in real-world persuasion. As such, little is known about the key verbal dimensions of persuasion or the relative impact of linguistic features on a message's persuasive appeal in real-world social interactions. We collected large-scale data of online social interactions from a social media website in which users engage in debates in an attempt to change each other's views on any topic. Messages that successfully changed a user's views are explicitly marked by the user themselves. We simultaneously examined linguistic features that have been previously linked with message persuasiveness between persuasive and non-persuasive messages. Linguistic features that drive persuasion fell along three central dimensions: structural complexity, negative emotionality, and positive emotionality. Word count, lexical diversity, reading difficulty, analytical language, and self-references emerged as most essential to a message's persuasive appeal: messages that were longer, more analytic, less anecdotal, more difficult to read, and less lexically varied had significantly greater odds of being persuasive. These results provide a more parsimonious understanding of the social psychological pathways to persuasion as it operates in the real world through verbal behavior. Our results inform theories that address the role of language in persuasion, and provide insight into effective persuasion in digital environments.

Entities: Chemical

Keywords: Attitude change; Language; Online interactions; Persuasion

Year: 2021 PMID： 34869936 PMCID： PMC8633087 DOI： 10.1007/s42001-021-00153-5

Source DB: PubMed Journal: J Comput Soc Sci ISSN： 2432-2725

Introduction

Understanding persuasion—how people can fundamentally alter the thoughts, feelings, and behaviors of others—is a cornerstone of social psychology. Historically, social influence has been outstandingly difficult to study in the real-world, requiring researchers to piece together society-level puzzles either in the abstract [1] or through carefully-crafted field studies [2]. In recent years, technology has driven interest in studying social influence as digital traces make it possible to study how the behaviors of one individual or group cascade to change others’ behaviors [3, 4]. Nevertheless, most social processes are complex, to the point where they are very difficult to study as they operate outside of the lab. However, the availability of digital data and computational techniques provide a ripe opportunity to begin understanding the precise mechanisms by which people influence the thoughts and feelings of others. Today, persuasion is often transacted—partially or wholly—through verbal interactions that take place on the internet [5]: a message is transmitted from one person to another through the use of language, altering the recipient’s attitude. As such, researchers have sought to identify linguistic features1 that are linked to a message’s persuasive appeal. A relatively sizable number of linguistic features that are important in message persuasiveness have emerged from this body of research and include features that indicate what a message conveys as well as how it was conveyed (Table 1). Models of persuasion, such as the Elaboration Likelihood Model (ELM) [6], have been used to identify these linguistic features and explain how they affect message persuasiveness.

Table 1

Summary of linguistic features and predictions

Measure	Description	Positively predictive of persuasion	Negatively predictive of persuasion	Positively and negatively predictive of persuasion or inconclusive
Word count	Raw word count	O’Keefe [51], O’Keefe [52], Calder et al. [53]	Hamilton and Mineo [54]	Petty and Cacioppo [6]; Wood et al. [55]
Self-references	References to oneself	Tan et al. [20]	Toma and D’Angelo [56]	Slater and Rouner [57]
Certainty	Words that denote confidence/certainty	Ahmad and Laroche [58]		Karmarker and Tormala [59]
Analytical thinking	Formal, logically-ordered, and analytical thinking	Xiao [60]		Kaufman et al. [61], Slater and Rouner [57], Allport and Postman [62]
Language emotionality (valence, arousal, dominance)	Valence (the degree of unpleasantness-pleasantness), Arousal (the intensity of emotion generated), and Dominance (the degree of control exerted)	Hazleton et al. [63]	Toma and D’Angelo [56]	Ahmad and Laroche [58], East et al. [38], Wegener et al. [64], Petty and Cacioppo [6], Tan et al. [20]
Hedges	A term or phrase that is ambiguous and lacks clear precision, often used to soften a message	Tan et al. [20]		Hanauer et al., [29], Hosman and Siltanen [65], Gibbons et al. [66], Hosman [67], Holtgraves and Lasky [68], Blankenship and Holtgraves [69], Toulmin [70]
Examples		Tan et al.[20], Baesler and Burgoon [71]
Abstract/ concrete	The degree to which language is conceptual and refers to intangible qualities (abstraction) and exudes perceptibility and contextualizes information (concreteness)			Doest et al. [72], Schwanenflugel and Stowe [73], Seifert [74], Douglas and Sutton [75], Hansen and Wanke [76], Larrimore et al. [13], Toma and D’Angelo [56], Pan et al. [77]
Reading difficulty	The amount of effort that is required to understand a piece of text measured	Goering [78]		Xu et al. [79]
Lexical Diversity	The richness and range of vocabulary measured via type-token ratio		Tan et al. [20]	Bradac et al. [80, 81], Daller et al. [82]

Summary of linguistic features and predictions Analytical thinking Abstract/ concrete Lexical Diversity Despite the impressive corpus of studies to date, the existing literature has several limitations. Studies have largely examined the effect of linguistic features on persuasion in isolation by only focusing on a small number of linguistic features (i.e., one or two) at a time. While this body of literature has collectively identified a relatively sizable number of linguistic features that are linked to message persuasiveness, it remains unclear how these links, taken together, inform the social aspects of verbal behavior in persuasion. In other words, what do the linguistic features connected with message persuasiveness reveal about the key verbal behaviors involved in persuasion? As language provides “a rich stream of ongoing social processes” [7], synthesizing these findings can provide a more complete understanding of the social psychological pathways to persuasion. In the same vein, real-world messages are constructed using a varied combination of linguistic features to transmit complex thoughts, emotions, and information to others. Nevertheless, studies tend to examine how a single linguistic feature (or a small set of features) correlate with persuasion without taking into account other potentially important linguistic features within a given message [8, 9]. The meaning of a given word or feature in any text is dependent on the context by which it was used which can be inferred by the words and features that surround it [10, 11]. As such, the effect of any particular linguistic feature on message persuasiveness can be attenuated by the presence of other features in the message. As they are typically studied in isolation, little is known about the relative impact of linguistic features on a message’s persuasive appeal. Furthermore, studies that examine the effect of linguistic features on persuasion tend to focus on persuasion in terms of engaging in specific behaviors [3, 12–14] rather than changing attitudes in general. Persuading people to engage in a specific behavior is conceptually distinct from changing people’s attitude on a topic. Although changes in behavior can facilitate changes in attitude, changes in behavior can also be dependent on attitude change (e.g., an individual may not engage in behavior change unless they believe that the behavior will result in a desirable outcome). Although changes in behavior can facilitate changes in attitude, changes in behavior does not always indicate that attitude change has occurred (e.g., an individual may decide to ultimately receive the COVID-19 vaccine because their employer requires it and not because their views regarding vaccines have changed) [15]. Finally, many studies that investigate the effect of linguistic features on persuasion are conducted in controlled lab settings [16, 17] due to the sheer difficulty of studying persuasion as it unfolds in the real-world. Given that persuasion often takes place through online social interactions [5], there is a need to study persuasion in this setting. Doing so also enables researchers to better understand how digital environments influence the process of persuasion, especially as digital environments are now progressively constructed to persuade the attitudes and behaviors of users [18] and there is “little consensus on how to persuade effectively within the digital realm” [19]. We sought to address these limitations in the current study. Specifically, we collected large-scale data from r/ChangeMyView, an online public forum on the social media website Reddit where users engage in debates in an attempt to change each other’s views on any topic. Most importantly, messages that successfully changed a user’s views are explicitly marked by the user themselves. That is, individuals are exposed to several messages and explicitly identified the message(s) that actually changed their views. We simultaneously examined linguistic features that have been previously linked with message persuasiveness (Table 1) between persuasive and non-persuasive messages to test the following research questions: What are the key linguistic dimensions of persuasion? Given that a relatively sizable number of linguistic features have been linked with persuasion, we first sought to determine whether these features could be meaningfully reduced to a smaller number of dimensions representing the key verbal processes of persuasion. We then assessed whether these dimensions were uniquely predictive of persuasion when controlling for the effects of the remaining dimensions. Which individual linguistic features, when assessed simultaneously, are the most essential and relevant to a message’s persuasive appeal? We then simultaneously assessed all linguistic features that have been linked with message persuasiveness in a single model to examine the relative impact of the features on a message’s persuasive appeal to identify features that were most crucial to message persuasiveness. While theory-driven predictions can be made regarding how each linguistic feature relates to persuasion, there has been a considerable amount of variability across studies in terms of which features positively or negatively relate to persuasion, as well as studies that show mixed or inconclusive results pertaining to the effect of a given linguistic feature on persuasion (see Table 1). Given that our primary goal was to obtain a more unified understanding of the social psychological pathways to persuasion via language, the current study is guided by a jointly data-driven and exploratory approach, with results informing our understanding of the directional relationship between the linguistic features and message persuasiveness. Overall, assessing the interplay between important linguistic features on persuasion using large-scale, real-world data help inform theories, such as ELM, that address how linguistic features influence persuasion to provide a parsimonious and ecologically-valid understanding of the social psychological processes that shape persuasion. Although some previous studies have used r/ChangeMyView data to investigate the effect of linguistic features on persuasion, they differ from the current investigation in important ways. The types and combinations of linguistic features that have been examined vary across studies and typically feature a mix of linguistic features that have and have not been linked to persuasion. For example, Tan et al. [21] examined how some persuasion-linked linguistic features (including arousal, valence, reading difficulty, and hedges), some non-persuasion-linked features (e.g., formatting features such as use of italics and boldface), and interaction dynamics (e.g., the time a replier enters a debate) were associated with successful persuasion. Wei et al. [22] investigated how surface text features (e.g., reply length, punctuation), social interaction features (e.g., the number of replies stemming from a root comment), and argumentation-related features (e.g., argument relevance and originality) related to persuasion. Musi et al. [23] assessed the distribution of argumentative concessions in persuasive versus non-persuasive comments, and Priniski and Horne [24] examined persuasion through the presentation of evidence only in sociomoral topics. Moreover, studies tend to have greater emphasis on model building to accurately detect persuasive content online rather than interpretability and a more unified understanding of the social psychological pathways to persuasion via language. For instance, Khazaei et al. [20] assessed how all LIWC-based features varied across persuasive and non-persuasive replies and used this information to train a machine learning model to identify persuasive responses.

Method

Data collection

We used data from the Reddit sub-community (i.e., “subreddit”) r/ChangeMyView, a forum in which users post their own views (referred to as “original posters”, or “OPs”) on any topic and invite others to debate them. Those who debate the OP (referred to as “repliers”) reply to the OP’s post in an attempt to change the OP’s view. The OP will award a delta (∆) to particular replies that changed their original views. Using data from r/ChangeMyView presents several advantages. All replies in r/ChangeMyView are written with the purpose of persuasion. The replies that successfully change an OP’s view are explicitly marked by the OP themselves, allowing for a sample of persuasive and non-persuasive replies. All OPs and repliers must adhere to the official policies2 of r/ChangeMyView. For instance, OPs are required to explain at a reasonable length (using 500 characters or more) why they hold their views and to interact with repliers within a reasonable time frame. Replies must be substantial, adequate, and on-topic. Because these policies are enforced by moderators, the resulting interactions are high in quality [21] and are conducted under similar conditions with similar expectations. OPs can also post their view on any topic, allowing for an examination of persuasion across a wide variety of topics. All top-level replies (direct replies to the OP’s original statement of views) posted between January 2013 and October 2018 were initially collected from the Pushshift database [25]. We focused only on the top-level replies and omitted any additional replies that were in response to a direct reply (i.e., a direct reply’s “children”). This ensured that replies that were deemed persuasive were due to its contents and not due to any resulting “back-and-forth” interactions given that deltas can also be awarded to downstream replies. We also omitted any top-level replies that were made by a post’s OP and any replies that received a delta in which the delta was not awarded by the OP. Because the data contained a substantially greater number of non-persuasive replies (99.39%) than persuasive ones, analyses were conducted on a balanced subsample that included all top-level replies that were awarded a delta and a random subsample of top-level replies that were not awarded a delta that came from the original posts in which at least one delta was awarded. This allowed us to compare the persuasive and non-persuasive replies from the same original post while bypassing issues associated with class imbalances [26]. As an example, consider a parent post that garnered two top-level replies that were awarded a delta, and three top-level replies that were not awarded a delta. In this case, the two top-level replies that were awarded a delta were included in the subsample and two out of the three top-level replies that were not awarded a delta would be randomly selected for inclusion in the subsample. Using the random number generator in Microsoft Excel, the 3 top-level replies that were not awarded a delta were assigned a random number between 1 and 100. Replies with the lowest two values were then selected for inclusion in the subsample. Parent posts almost always contained a greater number of top-level replies that were not awarded a delta than top-level replies that were awarded a delta. However, for the very few instances in which a parent post contained a greater number of top-level replies that were awarded a delta than top-level replies that were not awarded a delta, we included all top-level replies in the subsample (N = 9020 top-level replies; n = 4515 top-level replies that were awarded a delta; n = 4505 top-level replies that were not awarded a delta). Example persuasive and non-persuasive replies can be found in Table 2.

Table 2

Example replies

	Result
Where diving and embellishing do cause harm is where they interfere with call making abilities of referees. If fans or officials give referees a hard time for incorrect calls, which were dives or embellished by the player, then the referee will become more and more skeptical of future fowls or calls. Further leading to games being less fair for the players who engage in it I'll single out embellishing, if referees are conditioned by the player base to look for certain types of reactions for fowls (ie, the "neck snap"), it gives a disadvantage to players who do not engage in that kind of display. And, by dramatizing a certain fowl, makes it easier to recreate and convincingly ''dive'' Furthermore, there are sports in which the severity of the fowl impact the decision made by the referee. In rugby for instance, a referee may award a penalty which is sufficiently severe a free try(which is like a touch down). Embellishing takes the ability away from the referee to correctly judge the call	Persuasive (∆ awarded)
If you had an illness that was not depression would you feel bad about taking medication to cure it? Or, in some cases, to just be able to live without having to many problems? Do you see a diabetic who has to depend on insulin as a drug addict? Depression is actually a pretty complex illness. You might have the same symptoms as another person and it still could be for different reasons If you have time, read this: So, with all those different things going on in depression there are also different ways of helping people who have depression. There does not seem to be a one size fits all treatment. Different drugs try in different ways to right things that might have gone wrong in the brain When somebody starts taking antidepressants they do not magically feel better. It often takes a few weeks for anything to set in at all. That is different to the sort of drugs you get high on, as they work pretty fast Sometimes a person does not feel better on an antidepressant at all and sometimes they feel worse. Sometimes they feel better but the side effects are not worth it.—Alcohol always makes you drunk in a relatively predictable way, right (even though some people act different when drunk than others)? Antidepressants are not so predictable If an antidepressant works though, that is pretty great. Mood starts to improve slowly and you start to realize things you did not even take into account before anymore. Things that are just as valid an real as what you noticed while being depressed. Yes, your friends care for you and yes, you are worth it. The person sounding all frustrated when you talk with them about how you are feeling?—They just want to help but have no idea how, they are not 'just annoyed"".—Sure, there might be no life after death, but what stops you from having one while you ARE alive? No, how you are feeling right now is not invalid and I know how frustrating it is when people just say to “look on the bright side of life”. You see, when you are not depressed you can be sad and then you can stop being sad. Sometimes you can even make yourself stop being sad. Your friends are trying to help you, they are seeing good things going on along with the bad things and they don't know that to you everything is just somewhat worthless, unimportant, empty. You say you are depressed and they hear you are sad. So, they try to tell you that you do not need to be sad, that things are all right. (If you are like me back when I had my depression you know that there is no real big reason to be depressed and being reminded of that makes things even worse, 'cause it does not stop how you are feeling and nobody seems to realize that…) Now, is depression making you see the world more real than the "normal" view on life? You are talking about ""Depressive Realism"" and it is actually a thing people study. Findings are not fully conclusive and different people argue different things based on different studies and meta studies It's another interesting read: One more thing: You remember how I said non depressed people can be sad, but also stop being sad?—When you are on working antidepressants you will also still be able to feel sad. You will not suddenly always be happy. You will be able to feel horrible if something bad happens.—But you also will be able to feel great when things happen that are great. Able to feel alive Oh, and not all people stay on antidepressants. For many it is just a tool, a medication to help them get better, till there brain has fixed itself and can work right on it's own. I am one of those people Sorry if this was a bit to rambly, I fear I might have tried to address to many points at once	Persuasive (∆ awarded)
"Worth" does not exist independently. Things are worth something to someone. It could really be any sort of theoretical being, but for simplicity's sake, let's say it's you As you say, you have a limited lifespan. Someday you will be dead and gone. Nothing can be worth anything to you in a time when you do not exist. While you exist, there is possible worth. If you do not exist, worth is impossible. Life is worth living because it is the only way for anything to be worth anything at all That's the purely logical, philosophical approach. I'll throw in something that cuts a little more to the human side now This post is likely the product of a number of realizations. There is no god. All things die. We live in a physical Universe in which all things are bound by physical laws. Seems rather mundane, yeah? Wrong! The inanimate matter of the Universe has somehow managed to complexly weave itself into lifeforms and coat the earth in organic matter, but more importantly has created lifeforms that are self-aware and contemplative. It's the most amazing phenomena in existence. Supernovas are cool and all, but what's way cooler is a being that can think about supernovas Sure, nothing is eternal, but why is that problematic? Are things really only worthwhile if they last forever? Remember, nothing can be of worth to you if you don't exist. The only things that are worth anything at all are the things you can do during your life I leave you with a quote from Stanley Kubrick: "The most terrifying fact about the universe is not that it is hostile but that it is indifferent; but if we can come to terms with this indifference and accept the challenges of life within the boundaries of death however mutable man may be able to make them “ our existence as a species can have genuine meaning and fulfilment. However vast the darkness, we must supply our own light	Not persuasive (no ∆ awarded)
First of all, who should kill the killers? Is that not hypocritical? Someone murders another human, and for that, we propose to murder the killer? But then there’s the fact we rarely have 100% certainty of someone’s guilt. With the exception of things like video evidence, there's always things that could go wrong. Eyewitnesses could conspire against the accused, choosing to lie for a conviction, or the accused could be sentenced on evidence later found to be flimsy. And the problem is: we don't have a good way to know when that's the case And then there's the severe costs. You talk about tax money. Yet, the average death sentence costs more to process than locking the person up, mostly due to how we require high levels of certainty (but rarely exact) And finally, the death sentence focuses on punishment. But it does so in a way that could never allow the accused to repent or change. The American justice system (for example), has higher rates of re-activism than places such as Norway, which focus not on the punishment, but on rehabilitation Instead of killing the killers, why not help them? Take pity to them. There's obviously something wrong	Not persuasive (no ∆ awarded)

Note: All example replies were derived from different parent posts

Example replies Note: All example replies were derived from different parent posts To gain an initial understanding of the types of topics that were raised for debate in the subreddit, we randomly selected 100 replies from the final dataset and manually coded their content. Six overarching topics emerged: legal and politics; race, culture, and gender; business and work; science and technology; behavior, attitudes, and relationships; and recreation. More information regarding debated topics can be found in the supplementary materials.3.

Linguistic features

Prior to extracting linguistic features from our data, we conducted a cursory search of the psychological literature to identify prominent linguistic features reported to have a significant relationship with message persuasiveness in at least one published study. These linguistic features are listed in Table 1. Each reply in the r/ChangeMyView dataset was analyzed separately using Language Inquiry and Word Count (LIWC) [27] which calculates the percentage-use of words belonging to psychologically or linguistically meaningful categories. We used LIWC to quantify word count, analytic thinking (analytical thinking formula = articles + prepositions—personal pronouns—impersonal pronouns—auxiliary verbs—conjunctions—adverbs—negations; relative frequencies are normalized within LIWC2015 to a 0-to-100 scale, with higher scores reflecting more analytical language and lower scores reflecting more informal and narrative-like language), the percentage-use of self-references (i.e., first-person singular pronouns, or “i-words”), and the percentage-use of certainty terms in each reply within our corpus. Dictionaries of terms that have been rated on emotionality4 (i.e., valence, arousal, and dominance) from [28] were imported into LIWC to measure the percentage-use of language that scored high and low on valence, arousal, and dominance. A dictionary of hedges from [29] was also imported into LIWC to measure the percentage-use of hedges. Following [21], the use of examples was measured by occurrences of “for example”, “for instance”, and “e.g.”. Language abstraction/concreteness was measured using the linguistic category model, with higher scores indicating higher levels of language abstraction and lower scores indicating lower levels of language abstraction (i.e., greater language concreteness; formula for calculation = [(Descriptive Action Verbs × 1) + (Interpretative Action Verb × 2) + (State Verb × 3) + (Adjectives × 4)]/(Descriptive Action Verbs + Interpretative Action Verbs + State Verbs + Adjectives)) [30]. Type-token ratio, the ratio between the number of unique words in a message and the total number of words in the given message [31], was used to measure lexical diversity with higher scores indicating greater lexical diversity (type-token ratio formula = number of unique lexical terms/total number of words). Last, reading difficulty was measured via the SMOG Index which estimates the years of education the average person needs to completely comprehend a piece of text (SMOG Index formula = 1.0430 [√number of polysyllables × (30/number of sentences)] + 3.1291). Because a higher SMOG score indicates that higher education is needed to comprehend a piece of text, higher reading difficulty scores represent text that is more difficult to read and lower scores represent text that is easier to read [32]. More information about these linguistic features and example replies that scored high and low on each linguistic feature are reported in the supplementary.

Results

Given that a relatively sizable number of linguistic features have been linked with persuasion, we first determined whether these features could be meaningfully reduced to a smaller number of dimensions representing the key verbal processes of persuasion. Second, we determined whether these dimensions were each uniquely predictive of persuasion when controlling for the effects of the remaining dimensions. Third, we simultaneously assessed all linguistic features that have been linked with message persuasiveness in a single model to understand how linguistic features interact with one another to influence a message’s persuasive appeal and identify features most crucial to message persuasiveness. All data and analytic code can be found in the supplementary. Descriptive statistics, zero-order correlations between all variables, and complete analytic outputs for all analyses are presented in the supplementary. To identify the key linguistic dimensions of persuasion (RQ 1), we submitted all linguistic features into a principal components analysis (PCA) with a varimax rotation. Bartlett’s Sphericity Test (p < 0.001) and the Kaiser–Meyer–Olkin metric (KMO = 0.55) suggested that our data were suitable for analysis. Features with factor loadings greater than the absolute value of 0.50 were retained and used to quantify principal components. Three principal components were extracted that collectively accounted for 36.28% of the total variance: structural complexity, negative emotionality, and positive emotionality (see Table 3). Structural complexity had high loadings in the direction of lower lexical diversity, higher word count, and greater reading difficulty. Negative emotionality had high loadings in the direction of greater percentage-use of terms that scored low on valence and low on dominance. Positive emotionality had high loadings in the direction of greater percentage-use of terms that scored high on dominance, high on valence, and hedges.

Table 3

Results of PCA with Varimax Rotation

	Principal Components
Variables	Negative emotionality	Structural complexity	Positive emotionality
Low valence	0.89	0.02	0.04
Low dominance	0.88	0.06	0.05
Low arousal	− 0.21	0.03	0.12
High arousal	0.10	− 0.02	0.09
Lexical diversity	0.19	− 0.85	0.03
Word count	− 0.16	0.83	− 0.07
Reading difficulty	0.13	0.51	0.15
Analytic	0.05	0.34	− 0.25
Examples	0.01	0.06	0.03
High dominance	− 0.34	− 0.11	0.60
High valence	− 0.36	− 0.15	0.59
Hedges	0.06	0.11	0.57
Abstract/concrete	0.11	0.19	0.49
Certainty	0.03	0.03	0.36
Self-references	− 0.14	− 0.18	0.21
% of Variance	13.18%	12.70%	10.40%
Total variance	36.28%

Results of PCA with Varimax Rotation To assess if all three dimensions were uniquely important to message persuasiveness, we entered each component into a multilevel logistic regression analysis using lme4 [33]. This procedure corrects for non-independence of replies (i.e., replies to the same parent post) on the dependent variable: persuasion (delta awarded = 1, no delta awarded = 0). We include random intercepts for replies nested within parent posts and replies nested within repliers (i.e., some repliers provided replies to multiple original posts). All three components emerged as significant predictors of persuasion. For a one-unit increase in structural complexity, the odds of receiving a delta increase by a factor of 2.25, 95% CI [2.11, 2.39]. For a one-unit increase in negative emotionality, the odds of receiving a delta decrease by a factor of 0.89, 95% CI [0.85, 0.94]. For a one-unit increase in positive emotionality, the odds of receiving a delta also decrease by a factor of 0.92, 95% CI [0.88, 0.97]. Post-hoc power analyses conducted using the simr package in R (Version 1.0.5) [34] revealed that we had at least 96% power to detect a small effect (i.e., 0.15) for each of these factors on persuasion. Next, the individual linguistic features were assessed simultaneously to identify those that were the most essential and relevant to a message’s persuasive appeal (RQ 2). A logistic least absolute shrinkage and selection operator (LASSO) regression was performed using glmmLasso [35]. A LASSO regression is a penalized regression analysis that performs variable selection to prevent overfitting by adding a penalty (λ) to the cost function (i.e., the sum of squared errors) equal to the sum of the absolute value of the coefficients. This penalty results in sparse models with few coefficients. In other words, this method selects a parsimonious set of variables that best predict the outcome variable and has many advantages over other feature selection methods [36]. All linguistic features were entered into the LASSO regression model. A grid search was performed to identify the most optimal shrinkage parameter based on BIC. Five features emerged with nonzero coefficients: word count, lexical diversity, reading difficulty, analytical thinking, and self-references (Table 4).

Table 4

Results of LASSO regression

	LASSO regression
Variables	Estimate (SE)	z value
Word count	0.001 (0.0002)***	4.13
Analytic	0.004 (0.001)***	3.75
Self-references	− 0.04 (0.01)**	− 3.14
Lexical diversity	− 3.84 (0.27)***	− 14.36
Reading difficulty	0.04 (0.01)***	3.74
Certainty	–	–
High valence	–	–
Low valence	–	–
High arousal	–	–
Low arousal	–	–
High dominance	–	–
Low dominance	–	–
Hedges	–	–
Examples	–	–
Abstract/concrete	–	–

***p < 0.001; **p < 0.01; λ = 62

Results of LASSO regression ***p < 0.001; **p < 0.01; λ = 62 These variables were subsequently entered into a multilevel logistic regression. Again, persuasion was entered as the dependent variable and we included random intercepts for replies nested within parent posts and replies nested within repliers. All five predictors emerged as significant predictors of persuasion. Specifically, for a one-unit increase in word count, the odds of receiving a delta increase by a factor of 1.23, 95% CI [1.13, 1.35]. For a one-unit increase in reading difficulty scores (i.e., greater difficulty in reading comprehension), the odds of receiving a delta increase by a factor of 1.10, 95% CI [1.04, 1.16]. For a one-unit increase in analytical thinking, the odds of receiving a delta increase by a factor of 1.10, 95% CI [1.05, 1.17]. For a one-unit increase in self-references, the odds of receiving a delta decrease by a factor of 0.92, 95% CI [0.87, 0.98]. Last, for a one-unit increase in lexical diversity, the odds of receiving a delta decrease by a factor of 0.54, 95% CI [0.50, 0.59]. Post-hoc power analyses conducted using the simr [34] revealed that we had at least 96% power to detect a small effect (i.e., 0.15) for each of these predictors on persuasion.

Discussion

Previous studies have largely examined the effect of linguistic features on persuasion in isolation and do not incorporate properties of language that are often involved in real-world persuasion. As such, little is known about the key verbal dimensions of persuasion or the relative impact of linguistic features on a message’s persuasive appeal in real-world social interactions. To address these limitations, we collected large-scale data of online social interactions from a public forum in which users engage in debates in an attempt to change each other’s views on any topic. Messages that successfully changed a user’s views are explicitly marked by the user themselves. We simultaneously examined linguistic features that have been previously linked with message persuasiveness between persuasive and non-persuasive messages. Our findings provide a parsimonious and ecologically-valid understanding of the social psychological pathways to persuasion as it operates in the real world through verbal behavior. Three linguistic dimensions appeared to underlie the tested features: structural complexity, negative emotionality, and positive emotionality. Each dimension uniquely predicted persuasion when the effects of the remaining dimensions were statistically controlled, with greater structural complexity exhibiting the highest odds of persuasion. Interestingly, messages marked with less emotionality had higher odds of persuasion than messages marked with more emotionality, regardless of whether it was positive or negative. Emotionality can help persuasion in specific contexts [37, 38], but emotional appeals can also backfire when audiences prefer cognitive appeals [39]. Given that OPs were publicly inviting others to debate them, it is plausible that they preferred cognitively-appealing responses—ones that include an abundance of clear and valid reasons to support an argument—rather than emotionally-appealing responses. The linguistic features that made a message longer, more analytic, less anecdotal, more difficult to read, and less lexically diverse were most essential to a message’s persuasive appeal and uniquely predictive of persuasion. Longer messages provide more context and likely contain more arguments than shorter messages. Presenting more arguments can be more persuasive even if the arguments themselves are not compelling [40]. Longer messages likely provided more opportunities for the OP to engage with material that could potentially change their mind, thus increasing the likelihood of persuasion. Although more readable content is easier to understand and less aversive than less readable content [41], greater reading difficulty and comprehension can engender more interest, attention, and engagement [42, 43]. It can also facilitate deeper cognitive processing that leads to greater learning and long-term retention [44, 45]. This is especially true for individuals intrinsically motivated or capable of engaging in complex and novel tasks [46]. OPs were likely capable of and intrinsically motivated to engage in content that challenged their beliefs considering they were inviting others to debate them. The interpretation of users being intrinsically motivated to challenge their beliefs is also in line with the link that emerged between greater usage of analytical language and persuasion. Similarly, messages that focused less on one’s own personal experiences may have provided more objective evidence to support a particular argument, facilitating persuasion. Last, while greater lexical repetitions may be perceived as less interesting [31, 47], it facilitated persuasion in this context. Lexical repetitions provide effective ways for speakers to communicate complex topics as it keeps “lexical strings relatively simple, while complex lexical relations are constructed around them” [48]. Lexical repetitions are advantageous for navigating through the order and logic of an argument, providing “textual markers” that help readers connect important aspects of an argument together [49]. Lower lexical diversity, then, appeared to be beneficial for building arguments that are more cohesive, more coherent, and thus, more persuasive. Altogether, our findings reveal that the linguistic features linked to persuasion fall along three dimensions pertaining to structural complexity, negative emotionality, and positive emotionality. Our findings also highlight the importance of linguistic features related to a message’s structural complexity, particularly the verbal behaviors that provide a greater amount of factual evidence in a way that enables readers to connect important aspects of the information in an appropriately stimulating manner. Although the other linguistic features that were examined in this study may contribute to message persuasiveness to some degree, our results indicate that they are relatively less important after word count, lexical diversity, reading difficulty, analytical thinking, and self-references are taken into account. These findings also seem to reflect r/ChangeMyView’s digital environment. A central feature of r/ChangeMyView is ensuring that all posts and replies meaningfully contribute to the conversations. As such, OPs and repliers must adhere to all moderator-enforced policies of interaction. In addition, users who post on r/ChangeMyView are likely individuals who are open to attitude change given that they are publicly inviting others to debate them on a topic they already have an opinion on. This suggests that, in digital environments that underscore meaningful contributions to conversations, the ability to convey more objective information while fostering engagement and a holistic understanding of an argument are most vital to the alteration of established attitudes among open-minded individuals. Our findings also have implications for the process by which persuasion research via language is conducted. Assessing the relative importance of a linguistic feature on message persuasiveness allowed us to understand its interconnections with other linguistic features and its link to persuasion, yielding a more comprehensive and well-rounded understanding of the feature’s role in message persuasiveness. Consider word count, for example: without assessing word count’s relative importance on message persuasiveness in the current study, we would not have been able to ascertain its link to message persuasiveness via a message’s structural complexity and the importance of providing more content in a way that enables readers to connect important aspects of the information in an appropriately stimulating manner. Because the meaning of a word or linguistic feature in any text is dependent on the context by which it is used, understanding the social psychological pathways to persuasion via language requires researchers to account for the presence of multiple linguistic features within a given message when assessing a linguistic feature’s link to message persuasiveness. This holistic approach may also help reconcile conflicting results from previous research on language and persuasion. Our findings also inform theories, such as ELM, that address how linguistic features influence persuasion and provide a more precise understanding of the social psychological pathways to persuasion. For example, ELM states that here are two main routes to persuasion: the central route, which focuses on the message quality on persuasion, and the peripheral route, which uses heuristics and peripheral cues to help influence individual decisions regarding a topic [6]. Individuals are more likely persuaded via the central route if they have the ability and motivation to process the information. On the other hand, individuals are more likely persuaded via the peripheral route if involvement is low and information processing capability is diminished. OPs likely have the ability and motivation to process arguments from repliers and are thus likely persuaded via the central route given that they are publicly inviting others to debate them. Supplying more information to support a conclusion may be more likely to persuade via the central route, but this information also needs to be organized in a way that helps readers connect important aspects of the information together. A wealth of information that is structured in an incoherent manner would undoubtedly hinder comprehension, and thus, persuasion.

Strengths and limitations

Our dataset contained a large sample of replies that spanned a wide variety of topics, and provided high ecological validity given that it captured the process of persuasion as it occurred naturally without elicitation. The enforcement of rules on r/ChangeMyView yielded interactions that were conducted under similar conditions and expectations. This helped to minimize interaction variance without interfering with the naturalistic nature of the data. However, OPs can award deltas to responses within subtrees (the “children” of direct replies) typically as the result of “back-and-forth” interactions with repliers. These were not included in the current study as we only examined top-level responses. Our results could also differ by topic, recency of the post, and post length, and it is possible non-linguistic features such as the popularity of a post, the number of “upvotes” (i.e., the number of instances other users have registered agreement with a particular post or reply) a reply receives, and the number of deltas a replier has ever received may also impact message persuasiveness. Future studies should determine if these variables moderate the findings, and doing so would also address the relative importance of linguistic versus non-linguistic features on message persuasiveness. Although it is a policy on r/ChangeMyView that OPs must post a non-neutral opinion (i.e., their post must take a non-neutral stance on a topic), and posts that violate this rule are removed by moderators, it is possible that an OP’s post did not accurately reflect their true attitude or attitude strength. Given the nature of the data, this study cannot address whether the resulting attitude changes were long-lasting, nor if the OP’s attitude strength moderated their attitude change. Longitudinal studies can assess these points. Because there were substantially more non-persuasive replies (99.39%) than persuasive ones, we constructed a balanced subsample and conducted our analyses on this balanced subsample. While this strategy limited biased outcomes stemming from a large class imbalance, it also limits the generalizability of results to posts in which no persuasion occurred. Further examinations of the class imbalance are needed to address this issue. For example, it is possible that posts in which no persuasion occurred are systematically different from posts in which persuasion occurred. Or, perhaps the class imbalance simply reflects the rigid nature of attitudes. In addition, our results may only reflect a particular population given that Reddit users tend to skew younger and male [50]. Since we did not have access to subjects’ demographic information, we cannot assert the representativeness of our sample. Future research should investigate persuasion that takes place on other debate-style forums and websites to incorporate more diverse subjects, interaction modes, and digital environments.

9 in total

1. Effects of communication goals and expectancies on language abstraction.

Authors: Karen M Douglas; Robbie M Sutton
Journal: J Pers Soc Psychol Date: 2003-04

2. Truth from language and truth from fit: the impact of linguistic concreteness and level of construal on subjective truth.

Authors: Jochim Hansen; Michaela Wänke
Journal: Pers Soc Psychol Bull Date: 2010-10-14

3. Should persuasion be affective or cognitive? The moderating effects of need for affect and need for cognition.

Authors: Geoffrey Haddock; Gregory R Maio; Karin Arnold; Thomas Huskinson
Journal: Pers Soc Psychol Bull Date: 2008-03-14

4. Activating representations in permanent memory: different benefits for pictures and words.

Authors: L S Seifert
Journal: J Exp Psychol Learn Mem Cogn Date: 1997-09 Impact factor: 3.051

5. The Easier the Better? Comparing the Readability and Engagement of Online Pro- and Anti-Vaccination Articles.

Authors: Zhan Xu; Lauren Ellis; Laura R Umphrey
Journal: Health Educ Behav Date: 2019-06-19

6. Norms of valence, arousal, and dominance for 13,915 English lemmas.

Authors: Amy Beth Warriner; Victor Kuperman; Marc Brysbaert
Journal: Behav Res Methods Date: 2013-12

7. Hedging their mets: the use of uncertainty terms in clinical documents and its potential implications when sharing the documents with patients.

Authors: David A Hanauer; Yang Liu; Qiaozhu Mei; Frank J Manion; Ulysses J Balis; Kai Zheng
Journal: AMIA Annu Symp Proc Date: 2012-11-03

8. Psychological targeting as an effective approach to digital mass persuasion.

Authors: S C Matz; M Kosinski; G Nave; D J Stillwell
Journal: Proc Natl Acad Sci U S A Date: 2017-11-13 Impact factor: 11.205

9 in total