Literature DB >> 34172568

How quantifying the shape of stories predicts their success.

Olivier Toubia¹, Jonah Berger², Jehoshua Eliashberg².

Abstract

Narratives, and other forms of discourse, are powerful vehicles for informing, entertaining, and making sense of the world. But while everyday language often describes discourse as moving quickly or slowly, covering a lot of ground, or going in circles, little work has actually quantified such movements or examined whether they are beneficial. To fill this gap, we use several state-of-the-art natural language-processing and machine-learning techniques to represent texts as sequences of points in a latent, high-dimensional semantic space. We construct a simple set of measures to quantify features of this semantic path, apply them to thousands of texts from a variety of domains (i.e., movies, TV shows, and academic papers), and examine whether and how they are linked to success (e.g., the number of citations a paper receives). Our results highlight some important cross-domain differences and provide a general framework that can be applied to study many types of discourse. The findings shed light on why things become popular and how natural language processing can provide insight into cultural success.

Entities: Chemical

Keywords: cultural analytics; cultural success; discourse; natural language processing

Mesh：

Year: 2021 PMID： 34172568 PMCID： PMC8256009 DOI： 10.1073/pnas.2011695118

Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN： 0027-8424 Impact factor: 11.205

Narratives and other forms of discourse are powerful vehicles for informing, entertaining, maintaining social order, and making sense of the world (1–5). People watch movies, read books, and consume other narratives, and politicians, journalists, and even academics craft discourse when communicating and sharing ideas. But why are some narratives, or other types of discourse, more successful? And could a simple set of measures help explain variation in success in different domains? Across disciplines, researchers have long been interested in features of narratives (6–9). While some narratives seem to move faster, for example, others seem to move slower (10, 11). Similarly, some stories are described as “covering lots of ground” and some narratives are described as “going in circles” (i.e., returning to similar themes again and again). But while researchers and laypeople alike often describe narratives as expressing movement in some abstract space, little empirical work has actually attempted to measure such movements (8, 9, 12). Further, even less work has examined whether such movements have any impact (13, 14). Might certain ways of unfurling a set of ideas increase their success? Are movies that cover a lot of ground, for example, evaluated more positively? Note that these questions are not restricted to narratives. Narratives usually involve temporality (15, 16), or a sequence of events and actions, but similar questions could be asked of other types of discourse. Some academic papers or legal arguments, for example, seem to cover more ground than others and some textbooks move quickly through disparate ideas while others move more slowly. Might these features shape success (e.g., the number of citations an academic paper receives) and, if so, how? Attempts to answer such questions have been hindered by quantification. It is difficult to measure, for example, whether one text moves quickly or slowly. While manually coding such aspects might be possible for a small number of texts (17, 18), it is often subjective and difficult to scale. We fill this gap using natural language processing and machine learning. In any given text, some content appears earlier in the text and other content later. Using several state-of-the-art techniques, we plot chunks of texts as sequential points in a multidimensional space and extract features of the semantic path (i.e., speed, volume, and circuitousness). We examine tens of thousands of texts from a variety of domains (i.e., movies, TV shows, and academic papers) and test how speed, volume, and circuitousness relate to success (e.g., evaluations or citations). Importantly, we do not mean to suggest that academic papers are narratives or that what makes a movie successful is the same as what makes an academic paper successful. In fact, our findings suggest the drivers are quite different. Rather, our goal is to provide simple measures that help quantify the semantic progression of texts and illustrate how such measures relate to success in different domains.

Measures

To quantify semantic progression, we take texts (e.g., movies or academic papers), break them into approximately equal-sized chunks or windows, plot each chunk in a high-dimensional semantic space, and examine the path between chunks (see ref. 8 for related work using topic modeling). To do so, we use word embeddings (19), a technique that transforms words into high-dimensional numerical vectors such that the relationship between vectors captures the semantic relationship between words. We take each word , represented by , a 300-dimensional vector; index windows by ; and define the average word embedding vector for the words in each window . We denote as the number of windows in the document, and each text is represented by a sequence of points, , in the 300-dimensional latent word embedding space (see for more detail). From these sequences of points, we calculate new measures that characterize each text’s semantic path. A natural first measure of progression is speed or pacing (10). Just as a car can move slower or faster (i.e., covering a smaller or larger distance in the same period), content can move slower or faster (i.e., dwelling on semantically related concepts or moving a larger distance between content that is less semantically related). To capture this, we measure the distance texts travel between consecutive chunks (Fig. 1). Word embeddings capture semantic similarity (20–22) (see for additional validation), so consecutive chunks that are farther away are more likely to discuss different topics or themes. We compute the Euclidean distance between consecutive points: .* Normalizing total distance by text length generates the text’s average speed: .

Fig. 1.

Stylized illustration of the measures. Note that higher speed means more distance was covered in the same number of periods. Higher volume means that more ground was covered in the same number of periods. Higher circuitousness means that a less direct route was taken between a set of points. Speed presents a tradeoff. Larger semantic shifts should make content more engaging and exciting (4), but require additional cognitive effort to process and connect (23). More difficult textbooks, for example, tend to have less semantic similarity between paragraphs (24), which should require greater processing to understand (25). Consequently, the excitement that speed generates likely comes at a (cognitive) cost. As such, speed may have a positive or negative relationship with success, depending on the context. While speed is useful, it does not provide a complete picture. Two texts could cover the same distance with quite different semantic trajectories (e.g., one goes out and back, while the other goes out and then out even farther). Further, speed focuses only on consecutive points, but the meaning of content is often interpreted from the entire path (2). To begin to capture these nuances, we measure the volume that a text covers (Fig. 1). While some content is described as covering a lot of ground or touching on many themes, other content is seen as covering less ground (26). We measure such volume by approximating points with an ellipsoid by solving an optimization problem that finds the minimum volume ellipsoid containing all of these points (27). Normalizing this by the dimensionality of the ellipsoid captures a text’s volume and ancillary analyses show that this automated measure is correlated with human perceptions of ground covered (). Similar to speed, volume presents a tradeoff. Covering a lot of ground allows audiences to see and connect a wide range of topics but may increase the cognitive burden. Volume captures the ground covered, but not how these points are covered, so to further quantify the path taken we measure circuitousness (see Fig. 1 and for a less simplified illustration). We identify the shortest path a text could have taken, given the first point , the last point , and the other set of points “visited” during the text. This optimization problem is a modified version of the well-known traveling salesman problem (28). After solving this, we quantify the extent to which the actual sequence deviates from optimal. Circuitousness is defined as the ratio of actual distance traveled to the shortest possible path. That is, . This measure captures human perceptions of circuitousness (). While circuitousness might seem undesirable, it may allow the audience to create new and deeper connections between previously explored themes (29).

Results

We examine the relationship between these measures and success in three domains (i.e., movies, television shows, and academic papers). These domains were chosen based on data availability (), but the same approach could be applied to other types of texts (e.g., books or speeches). In addition to a standard set of control variables (), we control for textual content by including 100 topic intensities estimated by latent Dirichlet allocation (30). This ensures that our results are not driven by certain topics (e.g., love or social identity) being linked to success. Examining over 4,000 movies finds that narratives that move faster (i.e., travel farther in consecutive periods, on average) are evaluated more favorably (Table 1, column 1).

Table 1.

Link between semantic progression and success

	Movies	TV show episodes	Academic papers
Average speed	0.048*	0.072*	−0.125*
Normalized volume	0	−0.082*	0.095*
Circuitousness	0	0.006	0.070*
Controls
Year fixed effects	Yes		Yes
Genre fixed effects	Yes	Yes
Movie duration	Yes
TV channels fixed effects		Yes
Journal fixed effects			Yes
No. of pages			Yes
Log(words in document)	Yes	Yes	Yes
Log(sentences in document)	Yes	Yes	Yes
Topic intensities	Yes	Yes	Yes
No. of parameters	169	148	158
No. of observations	4,118	12,336	29,300
Mean-squared error	0.711	0.793	1.066
R2	0.306	0.326	0.364

Note that all independent variables for which coefficients are reported are standardized. The dependent variable is not standardized. Parameters are estimated using a lasso regression. Confidence intervals are obtained via bootstrapping. *The 95% confidence interval does not include 0. Dependent variable is IMDB ratings for movies and TV show episodes and log(1 + citations) for academic papers.

Link between semantic progression and success Note that all independent variables for which coefficients are reported are standardized. The dependent variable is not standardized. Parameters are estimated using a lasso regression. Confidence intervals are obtained via bootstrapping. *The 95% confidence interval does not include 0. Dependent variable is IMDB ratings for movies and TV show episodes and log(1 + citations) for academic papers. Examining over 12,000 TV show episodes finds a similar result (Table 1, column 2). Given that distant points are less similar, they should be more surprising or unexpected. This result is consistent with the suggestion that rapid storyline changes can make narratives more engaging (4). TV show episodes that cover less volume are also evaluated more favorably. While one could interpret this as driven by TV show episodes being shorter than movies, note that volume is normalized by the number of chunks of text, indicating that even for text of the same length, TV show episodes that cover too much ground are evaluated less favorably. This may be driven by what audiences look for when they consume content from different mediums. While high-volume movies may fit audiences’ expectations of being transported through a narrative, TV shows may be consumed as a quick diversion, and thus volume may have a more negative effect. Note that average speed and normalized volume are highly positively correlated in TV show episodes (), so each coefficient captures the effect of changing that variable, holding the others constant. Examining citations of 29,000 academic papers published in 22 journals reveals a distinctly different pattern (Table 1, column 3). First, speed has the opposite effect; papers that move faster are cited less. Rapid changes should increase the effort required to follow an argument, which may reduce citations. Second, volume has the opposite effect; papers that cover more ground are cited more (consistent with the finding that papers that link disconnected areas of knowledge receive more cites, ref. 31). Finally, papers that are more circuitous receive more citations. Consistent with the fact that “spiral” curriculums that revisit similar topics help students learn (32), by repeatedly touching on similar themes, circuitousness may make it easier to integrate disparate information. Given average speed and circuitousness are highly correlated in academic papers (), each coefficient should be interpreted as capturing the effect of changing that variable, holding the others constant. These effects are not trivial: A 1-SD increase in speed is associated with an approximately 12% decrease in citations [as noted in Table 1, the dependent variable for citations is log(1 + citations), so the coefficient should be interpreted carefully]. Volume and circuitousness show a similar effect (10 and 7%, respectively). Ancillary analyses () provide further context, comparing these variables’ explanatory power to noncontent variables shown to impact citations. The effect of a 1-SD change in average speed, for example, is comparable to the effect of a 1-SD change in institution prestige. Effects for TV show episodes and movies, which are not on a log scale, are more modest: A 1-SD increase in speed in movies (TV show episodes) is associated with an increase in average rating of 0.048 (0.072), on a 10-point scale with an SD of 1.01 (1.08). This is not surprising, given that movies and TV shows involve many nontextual factors (e.g., visual and audio elements). Ancillary analyses also begin to examine how distance and volume change over the course of a text (). Some texts, for example, might move at a consistent speed while others have more variation. Some might cover a lot of volume early on but less so as the content evolves. Given that recent experiences (e.g., the end of a movie) can have a larger impact on evaluations (33), one could imagine that end effects are particularly important. To capture these aspects, we calculate how much each new period adds to the text’s distance and volume and measure variation, trend, and end effects for incremental changes in both distance and volume (). Results are identical for movies and academic papers, but provide a more nuanced picture for TV show episodes: The positive effect of speed and the negative effect of volume are driven by changes that happen toward the end of the text.

Discussion

While many have theorized about features of narratives, less work has formalized these intuitions, or tested whether certain features of discourse are linked to success. This paper provides a set of measures to quantify the semantic progression of texts and the ground they cover. In particular, we examined speed, volume, and circuitousness and how they relate to the success of movies, TV show episodes, and academic papers. Results suggest that the features that make a successful movie may be different from those that make a successful TV show or academic paper, and future work might examine the roots of these cross-domain differences. The type of discourse (e.g., narrative vs. exposition), goal (e.g., to entertain vs. impart knowledge), modality (e.g., video vs. written), outcome measure (e.g., liking vs. citations), and audience expectations may all be important factors. Future work might also examine other types of texts (e.g., books, speeches, or documentaries). A preliminary analysis of 564 fiction books, for example, suggests that the measures reported here may also be helpful in understanding the success of books (). These measures could also be applied to personal narratives. People often use narratives to explain and understand their own lives (34). Just as creative people have more distance (i.e., less semantic relatedness) between their thoughts (35), semantic progression in personal narratives may provide insight into the writer’s personality or even how the act of writing impacts wellbeing (36). Note that we focus on the semantic relation between chunks of text, not on the content of each specific chunk. Two movies may have completely different content (i.e., characters and setting) but have similar speed, volume, or circuitousness. The structures we examine are also different from, and complementary to, dramatic structure (6) or emotional trajectories (9, 12). Rather than examining how sentiment changes across the course of a narrative, for example, or where the climax occurs, we focus on the semantic relationship between different points (i.e., whether content moves quickly between disparate ideas or covers a set of points that are semantically less similar). This work makes several theoretical contributions. First, it contributes to cultural analytics and understanding why cultural items succeed and fail. While some work suggests that cultural success is difficult, if not impossible, to predict due to dynamics of social influence (37), our paper finds that success is not completely random and that item characteristics may also play an important role. While this does not negate the importance of social dynamics, it highlights that with the right tools, researchers can extract features of cultural items that shed light on their success (31, 38–41). Second, and along those lines, this work also highlights the value of natural language processing to study culture (42). Researchers have long been interested in quantifying narratives and cultural dynamics, but measurement has been a key challenge. Natural language processing, however, provides a reliable method of extracting features and doing so at scale (43, 44). Consequently, it opens up a range of interesting avenues for further research. These tools may be particularly useful for researchers in philosophy, English, and other disciplines who are interested in quantifying aspects of discourse. Researchers in the digital humanities have recently made a number of interesting advances (8, 9, 13, 45, 46), and with the right tools, hopefully scholars can begin to quantify features of culture only dreamed about previously. Third, our findings dovetail with recent work on how psychological processes shape collective outcomes. A great deal of research has demonstrated that sociocultural background shapes individual-level psychological process (e.g., cognition and attribution) (47). But the reverse is also true; when shared across individuals, psychological processes can act as a selection mechanism, shaping the content of collective culture (48–50). In this case, how people process information, and evaluate content, may shape which movies, shows, and academic papers are more successful. Future work might examine the underlying cognitive and social processes that underlie these effects. As noted, desire for stimulation or for surprise, cognitive complexity, or processing ease, and a number of other aspects may all play a role. While it is difficult to test psychological mechanisms in field data, subsequent experimental investigations can hopefully manipulate different aspects directly and examine the underlying processes in greater detail. Work might also examine the consequences of these features for other downstream outcomes (e.g., comprehension, memory, and persuasion). Readers might learn more from content that covers more volume, for example, although covering too much ground too quickly may mitigate this effect. Similarly, circuitousness may, at least in some cases, improve memory by connecting new ideas to previously explored themes. Semantic progression may also impact the persuasiveness of things like political speeches or legal arguments. In conclusion, narratives and other forms of discourse offer a fertile ground to study features of content that shape success, and their psychological underpinnings. Emerging natural language-processing tools should open up a range of interesting directions for further study.

Materials and Methods

Data Preprocessing.

For each document (i.e., each movie, TV show episode, or academic paper), we tokenize the text (i.e., extract individual words from the script), transform each word to lowercase, and look up the embedding of each word , denoted as , a 300-dimensional vector. We use the word2vec word embedding model trained by ref. 19, which represents approximately 1 million words as real vectors in a 300-dimensional latent space. Other embedding approaches (e.g., Glove) yield similar results (). We split each document into nonoverlapping windows of approximately equal size. Based on prior work (51), we use the same target window size of 250 words across our three datasets, but to avoid breaking up sentences, some windows are slightly larger than 250 words. For example, if the first 10 sentences contain 240 words and the 11th sentence contains 15 words, we include all 11 sentences and end up with 255 words in the window. Results are similar for windows of other sizes (). We index windows by and define the average word embedding vector for the words in each window , , where is the set of words in window .

Detailed Description of Volume Calculation.

To calculate volume, we start by finding the minimum-volume enclosing ellipsoid containing points . The problem may be written as follows (27): Maximize det(A) subject to: A is a positive definite matrix. When the rank of the subspace spanned by the vectors is equal to 300 and there are at least 301 points, we solve the above problem directly using matlab code made available by ref. 27. However, if the number of points is less than or equal to the dimensionality of the word embedding space (300), or if the subspace spanned by the vectors is not full rank, then the above problem is degenerate, as it is possible to cover all points with a “flat ellipse” that lives in a subspace of dimension 300. For example, two points in a three-dimensional space may be covered by a line segment, and three points may be covered by an ellipse in the two-dimensional plane that contains these three points, which has a volume of 0 in the original three-dimensional space (see for an illustration). In these cases, we find the minimum-volume ellipsoid that contains all of the points , in the corresponding subspace (instead of the entire word embedding space). See for details. Once the minimum-volume enclosing ellipsoid has been computed, we find the eigenvalues of the (positive definite) matrix that defines this ellipsoid. The lengths of the axes of the ellipsoid are given by the inverse of the square root of the eigenvalues. The volume of the minimum enclosing ellipsoid that contains a set of points is equal to the volume of the unit sphere, multiplied by the product of the lengths of its axes (52). Therefore, the product of the length of the axes gives us a measure of volume relative to a unit sphere. To compare texts of different lengths, we normalize this measure by the dimensionality of the ellipsoid; i.e., we use the geometric mean (rather than the product) of the lengths of the axes of the minimum-volume ellipsoid corresponding to points , as our normalized measure of volume. This measure may be interpreted as the ground covered by the text.

Data and Empirical Approach.

The first dataset examines Internet Movie Database (IMDB) ratings of 4,118 movies based on their subtitles. The second dataset examines IMDB ratings of 12,401 episodes of TV shows based on closed captions. The third dataset examines the citations received by 29,300 academic articles published in 22 different journals in psychology, economics, sociology, political science, and anthropology between 1990 and 2019. See for more detail. To reduce overfitting, limit the number of nonzero coefficients, and reduce the effects of multicollinearity, we use lasso regression (53). Results are almost identical using ordinary least-squares regression (), but lasso seemed more appropriate because it is known to address multicollinearity. We log transform average speed, circuitousness, and normalized volume. Only observations for which all variables are available are included in the analysis. See for details.

20 in total

1. Emotional selection in memes: the case of urban legends.

Authors: C Bell; E Sternberg
Journal: J Pers Soc Psychol Date: 2001-12

2. Lexical shifts, substantive changes, and continuity in State of the Union discourse, 1790-2014.

Authors: Alix Rule; Jean-Philippe Cointet; Peter S Bearman
Journal: Proc Natl Acad Sci U S A Date: 2015-08-10 Impact factor: 11.205

3. The Function of Fiction is the Abstraction and Simulation of Social Experience.

Authors: Raymond A Mar; Keith Oatley
Journal: Perspect Psychol Sci Date: 2008-05

4. Atypical combinations and scientific impact.

Authors: Brian Uzzi; Satyam Mukherjee; Michael Stringer; Ben Jones
Journal: Science Date: 2013-10-25 Impact factor: 47.728

5. Combining natural language processing and network analysis to examine how advocacy organizations stimulate conversation on social media.

Authors: Christopher Andrew Bail
Journal: Proc Natl Acad Sci U S A Date: 2016-09-30 Impact factor: 11.205

How quantifying the shape of stories predicts their success.

Measures

Results

Discussion

Materials and Methods

Data Preprocessing.

Detailed Description of Volume Calculation.

Data and Empirical Approach.

1. Emotional selection in memes: the case of urban legends.

2. Lexical shifts, substantive changes, and continuity in State of the Union discourse, 1790-2014.

3. The Function of Fiction is the Abstraction and Simulation of Social Experience.

4. Atypical combinations and scientific impact.

5. Combining natural language processing and network analysis to examine how advocacy organizations stimulate conversation on social media.

6. Subliminal mere exposure: specific, general, and diffuse effects.

7. Expressive Writing in Psychological Science.

8. Word embeddings quantify 100 years of gender and ethnic stereotypes.

9. The evolution of pace in popular movies.

10. English verb regularization in books and tweets.

1. Linguistic, cultural, and narrative capital: computational and human readings of transfer admissions essays.