Neil Cohn1. 1. Center for Research in Language, University of California San Diego La Jolla, CA, USA.
Abstract
How do people make sense of the sequential images in visual narratives like comics? A growing literature of recent research has suggested that this comprehension involves the interaction of multiple systems: The creation of meaning across sequential images relies on a "narrative grammar" that packages conceptual information into categorical roles organized in hierarchic constituents. These images are encapsulated into panels arranged in the layout of a physical page. Finally, how panels frame information can impact both the narrative structure and page layout. Altogether, these systems operate in parallel to construct the Gestalt whole of comprehension of this visual language found in comics.
How do people make sense of the sequential images in visual narratives like comics? A growing literature of recent research has suggested that this comprehension involves the interaction of multiple systems: The creation of meaning across sequential images relies on a "narrative grammar" that packages conceptual information into categorical roles organized in hierarchic constituents. These images are encapsulated into panels arranged in the layout of a physical page. Finally, how panels frame information can impact both the narrative structure and page layout. Altogether, these systems operate in parallel to construct the Gestalt whole of comprehension of this visual language found in comics.
Comics have conveyed static drawn visual narratives for over a century, and growing research suggests that sequential images combined with text are an effective tool of communication and education (e.g., Nakazawa, 2005; Nalu and Bliss, 2011; Short et al., 2013), beyond just being entertainment. While theories about comics have been scattered in the humanities for several decades (for review, see Nöth, 1990; Cohn, 2012), only recently has scientific attention turned toward investigating just how readers comprehend complex graphic displays of sequential images. This growing literature of both theoretical and empirical research has established that extracting meaning from a comic page involves multiple interacting systems, analogous to the organization of a linguistic system (Cohn, 2013b): A graphic structure encodes the physical lines and shapes that compose the images, which construct meaningful expressions using a lexicon of stored graphic schemas. A narrative structure organizes these sequential images into a coherent message, while an external compositional structure arranges these panels across the physical layout of a page.Altogether, these structures comprise the “visual language” that underlies comics, manga, graphic novels, and other visual narratives, which may also interface with text in larger multimodal interactions. Here, we focus on the systems most involved with sequential comprehension of a page: and the , which may be mediated by an .The system that packages meaning at a discourse level. This “visual narrative grammar” assigns categorical roles to images based on prototypical correspondences with a conceptual structure of meaning. These narrative units are organized into hierarchic constituents that allow for various types of embedding.The structure governing the organization of the physical layout of comic pages. These structures most often divide pages into horizontal and vertical constituents, though they also allow inset panels to be enclosed within a larger dominant panel, and Gestalt relations such as staggered, overlapping, and separated panels.The constraints on how conceptual information gets framed into panel units, determining how much content they contain. This has ramifications on how those images act in a narrative and how they are organized in a page layout (ECS).
Visual narrative grammar
The question that has received the most attention regarding the visual language used in comics has been: How is meaning conveyed by a sequence of images? Early theories have focused on linear semantic changes between images (McCloud, 1993; Saraceni, 2003), consistent with prevailing theories of discourse structure (Halliday and Hasan, 1976). As a comprehender progresses through a discourse, they consistently monitor dimensions of time, characters, spatial location, and causality. Change in these dimensions requires an updating of the mental model being built from the complete understanding of the discourse (van Dijk and Kintsch, 1983; Zwaan and Radvansky, 1998), and inference for meaning left unseen (McCloud, 1993; Saraceni, 2003). Experiments have yet to examine these theories in the online comprehension of static visual narratives like comics, but research with film has confirmed that viewers can consciously identify these semantic shifts between individual film shots (Magliano et al., 2001; Zacks et al., 2009; Magliano and Zacks, 2011).While empirical evidence supports that readers track semantic changes between linear image relationships, this approach alone cannot explain the comprehension of visual narratives. Problems with linear relationships first arose because of observations that non-adjacent panels sometimes necessitate long-distance connections in a sequence and panels often form meaningful groupings beyond linear relationships. Such intuitions aligned with empirical work showing that participants highly agree on where to divide sequential images into episodic constituents (Gernsbacher, 1985). The first alternative approach proposed a hierarchic model that created constituents based on changes of spatial viewpoint on a scene, changes between characters, or changes in time (Cohn, 2003, 2010). This approach revealed that linear relations between panels might be structurally ambiguous in ways explainable by underlying hierarchic structures (Cohn, 2003, 2013c). These basic groupings eventually gave way to observations that panels play functional roles in a sequence, similar to—yet somewhat different from—traditional narrative categories (e.g., Freytag, 1894; Mandler and Johnson, 1977). The resulting theory has been named “Visual Narrative Grammar” (Cohn, 2013c).Visual Narrative Grammar (VNG) posits that, analogous to the way that sequential words take on grammatical roles that embed within a constituent structure in sentences, sequential images take on narrative roles that embed within a constituent structure in visual narratives (Cohn, 2013c). This is similar to previous “grammatical” approaches to narrative and discourse, such as the story grammars from the 1970s (e.g., Mandler and Johnson, 1977), yet these models differ in important ways (see Cohn, 2013c, for more details). It is important to stress that the comparison between narrative grammar and syntax is an analogy at the architectural level—images do not serve as nouns or verbs, and they convey information at a higher level than words (indeed, at a discourse level). Yet, narrative grammar uses a similar structural architecture as syntax, and these constructs are believed to operate in comprehension similar to the processing of syntactic representations. Whether these proposed similarities tie to common cognitive mechanisms is an active line of research.VNG uses basic narrative categories to organize sequences: Establishers passively introduce the relationships between entities; Initials depict the start of an event or interaction; Peaks show a climax; and Releases depict a resolution or coda of events. While these categories form the core of a canonical narrative arc, other categories elaborate on a sequence, be it through additional narrative categories (Prolongations, Orienters), modification of the primary categories (Refiners, Perspective Shifts), or modification of the constituent structures (Conjunction) (Cohn, 2013b,c). Here, we will focus on the basic properties of VNG through an example sequence.Consider Figure 1A, from the comic Sinfest (www.sinfest.net) by Tatsuya Ishida. An Establisher starts the sequence, passively introducing the relationship between the cat and the tree. The cat then begins his motion in the second panel, an Initial, climaxing as he reaches the tree branch in triumph, a Peak. Another Establisher then introduces the relationship between cat and dog, again with a passive state. The dog attempts to climb the tree (Initial), but he falls to the ground (Peak), resulting in the cat making fun of him (Release), a resolution to the dog's actions. The next panel Establishes a relationship between the dog and the stump, which he then hops onto (Initial) and assumes a protective role in a final climax (Peak).
(A) Sequences manipulating narrative grammar, semantic associations, or both, which are similar to the stimuli in Cohn et al. (2012a). The narrative structure for this Normal sequence is shown, and it is matched by the Structural Only sequence. (B) Reaction times to target panels in these sequence types, and (C) event-related potentials showing an N400 effect to panels in these sequences.
(A) Sequences manipulating narrative grammar, semantic associations, or both, which are similar to the stimuli in Cohn et al. (2012a). The narrative structure for this Normal sequence is shown, and it is matched by the Structural Only sequence. (B) Reaction times to target panels in these sequence types, and (C) event-related potentials showing an N400 effect to panels in these sequences.A second experiment in this study presented these same stimuli while recording ERPs. N400 effects were larger to panels from structural-only and scrambled sequences, intermediate to panels from semantic-only sequences, and the smallest to those from normal sequences (Figure 2C). These results suggest that the presence of narrative structure in structural-only sequences was not enough to attenuate the amplitude of the N400 effect, a waveform associated with semantic processing. Thus, while semantic information (including linear changes in coherence) clearly plays a role in the processing of sequential images, it does so in combination with a narrative grammar.In addition, the amplitude of the N400 effect was attenuated across the ordinal position of normal sequences: the largest amplitudes appeared at the start of the sequence and became smaller as the sequence progressed. Because no such attenuation was found in other sequence types, this indicated both structure and meaning allowed for a build-up of meaning across a sequence. These findings again paralleled ERP results in analogous research of sentence processing (Van Petten and Kutas, 1991), and they also align with behavioral research showing that participants view images at the outset of a sequence slower than those later in the sequence (Gernsbacher, 1983; Cohn and Paczynski, 2013; Cohn, 2014). At the start of the sequence, readers may need more time to “lay a foundation” (Gernsbacher, 1985) of knowledge for the rest of the sequence (as in the function of an Establisher), which then allows for faster viewing (or attenuated N400 effects) as meaningful information accrues throughout the narrative.Finally, though Cohn et al. (2012a) found no difference in the N400 effect between panels in structural-only and scrambled sequences, a negativity between these waveforms did appear in a localized left anterior region of the scalp. This distribution across the scalp differed distinctly from the more widespread negativity shown to the N400 effect, and was hypothesized to be similar to the left anterior negativity (LAN) effect evoked by violations of syntactic structure in sentences (Neville et al., 1991; Friederici et al., 1993). This left anterior effect was also correlated with a measure of participants' comic reading expertise—the more experience participants had, the larger the difference between these brain responses. Expertise effects like these are not unprecedented: the ability to accurately arrange images in a sequence and to infer missing panels correlates with both age and experience reading comics (Nakazawa, 2005; Nakazawa and Shwalb, 2012). Thus, not only do comprehenders utilize a narrative grammar in understanding sequential images, but such comprehension is modulated by their “fluency” in this visual language.
External compositional structure
Separate from the content of a visual narrative, actual comics arrange panels physically on a page. Navigating this “external compositional structure” (ECS) of page layout cannot rely on the meaningful content of the panels since a single sequence can be arranged into numerous layouts with no effect on its meaning, as in Figure 3. This sort of rearrangement typically happens to comic strips when formatted for newspapers: they might appear as a horizontal strip, a vertical stack, or a four-panel grid. Unless these changes alter the actual order in which panels are read, then these alterations only impact the ECS, with no change in the conceptual/narrative structure. Moreover, data from eye-tracking experiments have shown that readers do not explore various potential pathways before progressing panel-by-panel (Nakazawa, 2002; Omori et al., 2004; Chiba et al., 2007), indicating that panel content does not provide the main motivation to their reading order (though an alternate order may be chosen if content confounds that intended order). Because of these reasons, ECS uses separate principles than those of the narrative/conceptual structures.
We saw above how altering the framing of panels might change a sequence's layout, but framing might also impact the narrative. For example, framing might determine how many characters appear in a panel, as in Figure 3B in panels 4.1/4.2 or 7.1/7.2: Should two characters at a single narrative state be shown together in a single panel, or should those characters be broken up, each into their own panel? These alterations still do not necessarily change the meaning (semantics) of the sequence, though they do alter the pacing (narrative) and the layout (ECS), and thus aspects of framing seem to operate in between these other structures.First, individual panels frame how much information is depicted in a scene. In a sense, the panel borders simulate a “window of attention” that frames only the content an author wants the reader to assimilate. Information that is not directly depicted in panels is either not important or meant to be inferred. Panels therefore act as “attention units” that can be categorized based on how much information they contain, as depicted in the “attentional framing matrix” in Figure 4 (Cohn, 2007, 2013b). In addition, framing intersects with ECS. A single image could be split up into multiple divisional panels, where the larger image is recognized because of image constancy, but the component parts individuate certain characters. In addition, inset panels may frame information within a larger dominant panel, again to focus attention on that element.
As demonstrated, sequential images involve several structures operating independently of each other, yet all interfacing together. For the example Sinfest comic, these connections can be traced between panel numbers across figures. These tree structures are not isomorphic—the constituents in narrative structure do not cleanly align with those from the ECS. For example, in the original layout, the Release of the second narrative constituent (panel 7) starts the third horizontal tier rather than ending a previous tier. Thus, narrative constituent boundaries do not always line up with the boundaries of the physical layout.This “parallel architecture” of narrative structure and ECS is analogous to the organization of language, where each linguistic substructure (phonology, syntax, semantics) operates with its own principles, yet interfaces with the others to form the whole of linguistic knowledge (Jackendoff, 2002). Because these components are separate, one structure can change while the others remain the same. For example, different layouts can convey the same meaningful content (as in Figure 3B), or the reverse, the same layout could be used for different content.Future research can better explore the interactions between these structures, such as the mappings that may exist between narrative and layout. Locative information often coincides with the first panel of a page, and suspenseful panels (Initials) often occur at the final panel on a page, thereby inducing a thrilling page turn and subsequent reveal of primary information (Peaks) on the next page (McCloud, 2000). Narrative arcs may alternatively conclude at page borders, thereby using the page layout as a break between constituents. Also, panels that occupy whole “splash pages” are likely to be Peaks—since the large size should echo a climactic moment of the narrative. Inset panels often zoom in on information in a larger panel (“Refiners”), or depict additional characters in the broader scene from the dominant panel (“Environmental-Conjunction”) (see Cohn, 2013b,c). These mappings between narrative and layout could be explored through corpus analyses of comic pages and experimental manipulation.Beyond these structural interfaces, we can also explore how these structures interact in comprehension. Can changes in content force readers to navigate a page in ways that go against their preferred rules? Do readers prefer boundaries between narrative constituents to line up with the boundaries in ECS? What changes in layout might confuse readers about the meaning of the narrative structure? These and other questions can frame future experimentation on the relationship between these structures.
Conclusion
While concerted scientific research on visual narratives has begun to emerge, these initial forays have shown the advantage of a multilayered approach that balances theoretical modeling, corpus analysis, and empirical experimentation using both behavioral and neurocognitive measures. Altogether, this work has provided evidence for the interactions of narrative, meaning, page layout, and framing, and that familiarity in these structures contributes to a larger fluency in the visual language used in comics.
Conflict of interest statement
The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.