| Literature DB >> 34950484 |
Pavan Holur1, Shadi Shahsavari1, Ehsan Ebrahimzadeh1, Timothy R Tangherlini2, Vwani Roychowdhury1.
Abstract
Social reading sites offer an opportunity to capture a segment of readers' responses to literature, while data-driven analysis of these responses can provide new critical insight into how people 'read'. Posts discussing an individual book on the social reading site, Goodreads, are referred to as 'reviews', and consist of summaries, opinions, quotes or some mixture of these. Computationally modelling these reviews allows one to discover the non-professional discussion space about a work, including an aggregated summary of the work's plot, an implicit sequencing of various subplots and readers' impressions of main characters. We develop a pipeline of interlocking computational tools to extract a representation of this reader-generated shared narrative model. Using a corpus of reviews of five popular novels, we discover readers' distillation of the novels' main storylines and their sequencing, as well as the readers' varying impressions of characters in the novel. In so doing, we make three important contributions to the study of infinite-vocabulary networks: (i) an automatically derived narrative network that includes meta-actants; (ii) a sequencing algorithm, REV2SEQ, that generates a consensus sequence of events based on partial trajectories aggregated from reviews, and (iii) an 'impressions' algorithm, SENT2IMP, that provides multi-modal insight into readers' opinions of characters.Entities:
Keywords: narrative theory; natural language processing
Year: 2021 PMID: 34950484 PMCID: PMC8692958 DOI: 10.1098/rsos.210797
Source DB: PubMed Journal: R Soc Open Sci ISSN: 2054-5703 Impact factor: 2.963
Figure 7A measure of perceived complexity per character across novels. The colour blue corresponds to the relative number of empirical samples per character-specific heatmap used to compute entropy (prior to smoothing). Each translucent colour corresponds to a specific novel and plotted are the respective entropies of characters that have at least four impression clusters. We found b = 50, and w = 3 to be optimal hyperparameter choices to explore the differences in the complexity measure between characters.
Example impression clusters for ‘Bilbo’ in The Hobbit: Clusters 1 and 2 describe impressions of ‘Bilbo’s character while clusters 3 and 4 describe his profession and community. Cluster marked −1 is noise. Labels for each cluster are aggregated based on the most frequent monograms per cluster.
| character | descriptors |
|---|---|
| Bilbo | The Hobbit |
| Cluster 1 | [‘not the interesting character’, ‘timid not’, ‘not enthusiastic’, ‘reluctant’, ‘not the type of hero’, ‘less cute’, ‘not as cool’, ‘unsure of situation’, ‘a small unadventurous creature’, ’Perhaps just not the kind of character’, ‘not as important’, ‘less cute’] |
| Cluster 2 | [‘a true personality’, ‘an exemplary character’, ‘such a great character’, ‘resourceful’, ’likable’, ‘still loveable’, ‘quite content’, ‘such a strong character’, ‘an amazing character’, ‘respectable’, ‘a great protagonist too’, ‘clever’, ‘such an amazing character’, ’a peaceful’, ’such an endearing character’, ’a great choice’, ’a fantastic lead character’, ’quite engaging’, ’cute’, ’much charismatic character’, ’such a fantastic Character’, ’truly beautiful’, ’enjoyable’, ’just so charming’, ’personable’, ’able’, ’the best character’, ’quite skilled gets’, ’awesome’, ’smart’] |
| Cluster 3 | [‘of course the burglar’, ‘a thief’, ’a thief go’, ‘to a burglar’, ‘to a thief’, ‘to a thief’, ‘the burglar’, ‘their designated burglar’, ‘could a burglar’, ‘of course the burglar’, ‘a Burglar’, ‘a Burglar’] |
| Cluster 4 | [‘a respectable hobbit’, ‘a respectable Hobbit’, ‘a sensible Hobbit’, ‘a clean well mannered hobbit’, ‘a respectable Hobbit’, ‘a sensible Hobbit’, ‘a proper hobbit’] |
| Cluster 5 | [‘small’, ‘small’, ‘little’, ‘small’, ‘little’] |
| Cluster −1 | [‘rich’, ‘the right man’, ‘a feisty character’, ‘the uncle of Frodo’, ‘unbelievably lucky’, ‘the perfect example of success’, ‘nostalgic’, ‘middle aged’] |
Figure 1Expanded story network graphs. The expanded story networks for the five literary works—nodes that represent characters in the story are in green while the actants extending the original character story network are in orange. (a) Of Mice and Men: the node ‘steinbeck’ has an in-degree of 0 suggesting readers’ understanding of the author’s impact on creating complex story actors, while the actants have no meaningful return engagement. Similarly, the ‘place’ node cannot directly affect causal change in the story and as a result is very rarely found in the subject part of a relationship (the out-degree is 0). (b) Frankenstein: the subnetwork of ‘letters’, ‘author’ and ‘novel’ indicate that readers recognize the epistolary nature of Frankenstein. The common node ‘people’ (which is found in most of the graphs) represents the reviewers’ perception of other reviewers. (c) To Kill a Mockingbird: important and intangible actants such as ‘racism’, ‘lawyer’, ‘personality’ compose the extended story network nodes in this graph. The ‘personality’ node reflects the novel’s dedication to character development, be it of ‘scout’, ‘atticus’ or even ‘arthur’. (d) The Hobbit: the readers’ classification of the novel’s genre is immediately apparent in the nodes ‘adventures’, ‘quest’, ‘home’ and ‘journeys’. Inanimate actants such as ‘home’, ‘journey’, ‘quest’ and ‘ways’ typically have a very low out-degree (in this case 0) whereas ‘tolkien’ has a very low in-degree. The node ‘ways’ signals strategy: ‘Dwarves’ have ‘ways’ or ‘Bilbo Baggins’ took ‘ways’. (e) Animal Farm: the nodes ‘rebellion’ and ‘revolution’, in conjunction with the nodes ‘power’ and ‘control’ highlight the sustained themes of power struggle, social dynamics and politics that lay at the ideological root of the novel. The author ‘orwell’ once again has a high out-degree and the node ‘ways’ once again signals strategy: ‘hens’ think of ‘ways’ and ‘pigs’ wanted ‘ways’.
Figure 2The Hobbit. The complete event sequence network—nodes with degree >2 are shaded turquoise and edges verifiable by at least two review samples are in red.
Figure 3The Hobbit. Highest total node degree event sequence in the network; edges such as the shaded edge shown are removed during sequencing.
Figure 4The complete event sequence network of the remaining four novels. Nodes with degree greater than 2 are shaded turquoise and edges verifiable by at least two review samples are in red. The graphs can be further explored on our data repository [59]. (a) Of Mice and Men. Critical events such as «Lennie» killing «puppy», «Lennie» killing «Curley's wife» and «George» killing «Lennie» occur in succession in the story and this sequencing is reflected in the reviewers’ posts. (b) Frankenstein. The network correctly captures the order of important events including: «Frankenstein» creates «monster» →« monster» kills «Elizabeth» →« Frankenstein» chases «monster». We also capture Frankenstein’s regret upon his creating the monster. «Frankenstein» abandons «monster» occurs before his creating the monster, which is a false positive. (c) To Kill a Mockingbird. The event, «Atticus» nurtures «Jem» and «Scout», appears in the early part of the event sequence. The core sequencing structure is also verifiable: «Tom» is accused of raping «Mayella» → «Atticus» defends «Tom» →« Bob» attacks «Jem» →« Boo» saves «Jem» →« Boo» carries Jem to «Atticus». (d) Animal Farm. The story has several characters and routinely introduces events that have past precedent: the windmill has to be constructed twice; there are several uprisings on the farm; the commandments are changed multiple times. The character set promises to enlarge the event sequence network, while the similarities drawn between entities and their relationships in recurrent situations makes event sequence network generation a challenge. Nevertheless, we extract many useful sub-sequences including: «snowball» is exiled from «farmhouse» →« napoleon» attacks snowball with «pigs» →« napoleon» chases «snowball» →« napoleon» chases snowball off «farmhouse» →« pigs» change «commandments».
Performance of the sequencing algorithm REV2SEQ on the five stories: the error margins were computed by estimating bounds for each score by replacing the labels marked X or unsure with all 0 s or all 1 s.
| story | ||
|---|---|---|
| 92.35 ± 5.29 | 94.12 ± 5.88 | |
| 75.75 ± 5.45 | 77.27 ± 1.51 | |
| 88.00 ± 4.00 | 90.00 ± 3.34 | |
| 95.71 ± 4.28 | 100.00 ± 0.00 | |
| 78.57 ± 4.50 | 80.77 ± 2.74 |
Figure 5The (symmetric) heatmap for the character ‘Victor Frankenstein’. The similarity scores between clusters of impressions labelled by the row/column headers are computed by algorithm 3. The sub-matrices that are deep red or blue imply a hierarchical structure to the mutual similarity or dissimilarity between groups of impression clusters. The diagonal entries are +2 as a cluster of impressions is most similar to itself.
Figure 6The (asymmetric) heatmap comparing the character ‘Victor Frankenstein’ from Frankenstein and ‘Atticus Finch’ from To Kill a Mockingbird. The similarity scores between clusters of impressions labelled by the row/column headers are computed by algorithm 3. The colour coding of impression clusters suggests valuable information stored in these representations about pairwise character similarity across novels, capturing the readers’ process of aligning impressions from one novel to impressions created while reading another novel.