Literature DB >> 35398974

Acquiring Complex Communicative Systems: Statistical Learning of Language and Emotion.

Ashley L Ruba¹, Seth D Pollak¹, Jenny R Saffran¹.

Abstract

During the early postnatal years, most infants rapidly learn to understand two naturally evolved communication systems: language and emotion. While these two domains include different types of content knowledge, it is possible that similar learning processes subserve their acquisition. In this review, we compare the learnable statistical regularities in language and emotion input. We then consider how domain-general learning abilities may underly the acquisition of language and emotion, and how this process may be constrained in each domain. This comparative developmental approach can advance our understanding of how humans learn to communicate with others.

Entities: Chemical

Keywords: Development; Emotion; Infancy; Language; Statistical learning

Mesh：

Year: 2022 PMID： 35398974 PMCID： PMC9465951 DOI： 10.1111/tops.12612

Source DB: PubMed Journal: Top Cogn Sci ISSN： 1756-8757

Introduction

Infants are faced with a staggering problem: they must learn to effectively communicate with other humans in their world, without explicit instructions about how to do so. To communicate with others, infants must learn to understand the complex signals sent by others and begin to generate interpretable signals of their own. Despite the scope of this challenge, neurotypical infants rapidly learn two naturally evolved communication systems‐language and emotion‐during the first few years of postnatal life. While these two domains include different types of content knowledge, it is possible that similar learning processes underlie their acquisition. In this review, we consider how domain‐general statistical learning abilities may support the acquisition of language and emotion. We focus on how infants come to understand input in each domain, rather than how infants learn to produce communicative signals. Specifically, we focus on how infants learn to understand foundational and early‐acquired aspects of emotion and language (such as the meanings of facial configurations and words). This initial knowledge provides infants with new ways of representing the world, setting the stage for acquiring more complex and sophisticated aspects of language and emotion (e.g., pragmatics, complex syntax, social display rules, theory of mind). Our comparative developmental approach has the potential to further illuminate the nature of human learning.

Input to learners of complex communicative systems

As adults, we tend to think of our linguistic and emotional communication systems as objective and consistent, but it is unlikely that either system appears that way to an infant. Why should we preferentially attend to faces rather than fingers when discerning another person's emotional state? When listening to someone talk, why attend to vocal prosody and not eye blinks? The developing child learns to ignore a vast amount of available, but irrelevant, perceptual information, in order to focus on the most relevant regularities (e.g., infants learn that a sneer conveys emotional information, while a sneeze does not). Infants also learn to generalize and detect these relevant regularities over vast individual differences—in people's voices, faces, personalities, genders, ages, and other person‐specific features. In these regard, infants’ linguistic and emotional environments can be characterized, simultaneously, by both a richness and poverty of available input. The input is rich in that it contains massive amounts of linguistic and emotion‐relevant cues. Yet the input is also impoverished in that it is largely unlabeled and incredibly noisy (both statistically and perceptually). From this complex assemblage of input, infants learn to formulate abstract meanings, categories, inferences, and generalizations about language and emotion. Below, we consider specific characteristics of the input available to young learners in each domain.

Input to infant language learners

From decades of research in linguistics and other areas of cognitive science, we know that all natural languages consist of structured sequences of sounds or signs, organized in a limited number of ways. These include sounds or sign features (e.g., phonemes), phonology (patterns of phonemes and other “musical” aspects of a language, like rhythm), morphemes (the smallest units of meanings), words (consisting of one or more morphemes), lexical categories (e.g., nouns and verbs), meanings (semantic representations), and syntax (patterns or rules built over lexical categories and other such elements). Starting from the simplest and most perceptually available components of this input (i.e., sounds in spoken languages; handshapes and movements in sign languages), infants arrive at the richer and more abstract components of language (e.g., grammars), which permit generalization beyond the input that has been received. Yet, there is still substantial debate regarding innate knowledge about language (for a range of recent perspectives, see Christiansen & Chater, 2015; Lasnik & Lidz, 2016; Linzen & Baroni, 2021; Pearl, 2021; Perfors, Tenenbaum, & Regier, 2011; Shi, Legrand, & Brandenberger, 2020). Regardless of one's stance in this debate, infants are faced with complex learning problems, including learning idiosyncrasies of a specific language. Moreover, these problems change over the course of language learning. For example, an infant cannot begin to learn their native language's syntax until they have learned some of its words (Saffran & Wilson, 2003). The challenges facing language learners are thus underscored by the simultaneous richness and poverty of linguistic stimuli. From the perspective of the richness of the stimuli, part of infants’ challenge is to sift and winnow language input to discern relevant cues to native language structure. An example of this process comes from the varied ways in which pitch is used across languages. An English word spoken by two individuals (e.g., a female child vs. an adult male) may differ widely in pitch but have the same meaning. However, tonal languages like Mandarin or Hmong incorporate systematic pitch differences that completely alter the meanings of words. Thus, the structure of pitch in linguistic input varies from language to language, and infants must either learn to ignore this variation (if learning a non‐tonal language) or track it (if learning a tonal language) (Hay, Graf Estes, Wang, & Saffran, 2015; Quam & Swingley, 2010). A similar example is found with phonemes. Although individuals pronounce phonemes differently across languages, infants are initially able to differentiate all phonemes regardless of their native language(s). With experience, however, infants have difficulty distinguishing phonemes in their non‐native languages (Kuhl, Williams, Lacerda, Stevens, & Lindblom, 1992; Werker & Tees, 2002). This “perceptual narrowing” process (also observed in other domains; Hannon & Trehub, 2005; Maurer & Werker, 2014; Pascalis et al., 2014) allows infants to expertly perceive relevant sound distinctions while ignoring other, less relevant distinctions (Kuhl et al., 2006; Maye, Werker, & Gerken, 2002). Language input is also impoverished in many ways. Infant language learning is largely unsupervised, in that most input is unlabeled. Caregivers do not point out word boundaries in continuous speech or the meanings of many words that infants hear (consider the implausibility of a caregiver explaining the meanings of “the” or “it”). Caregivers also do not label lexical categories for infants (e.g., whether “book” is a noun or verb), and these categories are not clearly signaled by observable semantic information (e.g., how does “love” map onto the visible world?). Even in situations that seem clear, there is still extensive referential ambiguity. When an infant hears a given word (e.g., “dog”) and sees an adult point at the animal, the word could refer to any number of observable features of an object—its color, its ears, the whole animal, the action in which the animal is engaged, etc. (Quine, 1960). Given that infants’ language production lags months or even years behind their language comprehension, much of their knowledge is unobservable to caregivers. It is difficult to determine what words an infant understands and whether infants have correctly construed meanings and grammatical regularities. Errors are sometimes visible in children's language production, but caregivers seldom provide explicit negative feedback (i.e., caregivers do not correct infants’ grammatical errors; Brown & Hanlon, 1970; Morgan & Travis, 1989). Infants learn their native language(s) largely in the absence of any explicit instruction. Many additional limitations in the input also exist, as highlighted in classic linguistic arguments about the “poverty of the stimulus” (Chomsky, 1965). Yet, despite the complexity of the linguistic environment, infants nevertheless uncover key structures in language input with impressive speed.

Input to infant emotion learners

While language researchers have benefited from the field of linguistics, emotion researchers do not have an analogous discipline. For this reason, much less is known about the range, frequency, types, and extent of cultural variation in emotion cues, or the regularity of cues that infants encounter. Emotions are a social construction that refer to instances of feelings. There are many ways that humans can feel things, and there is no shared agreement among scholars about which feelings ought to be considered as “emotions” versus something distinct from emotions, such as reflexes, sensations, or sentiments. For this reason, some definitions of emotion include hunger, warmth or coldness, physical pain, desire, fatigue, thirst, interest, and the experience of touching something, whereas other definitions exclude these feelings from the construct of emotion. Nonetheless, there is shared agreement that an instance of emotion is when we categorize, organize, or otherwise make sense of the unfolding of our feelings. Generally, the input for emotion learning includes various types of cues—such as paralanguage (e.g., sighs, grunts, giggles; Friend, 2000; Woodard, Plate, Morningstar, Wood, & Pollak, 2021), facial movements (e.g., activation of one or more facial muscles; Barrett, Adolphs, Marsella, Martinez, & Pollak, 2019), skin coloration (e.g., changes in blood flow and oxygenation contributing to blushing or pallor; Thorstenson, 2018; Thorstenson, McPhetres, Pazda, & Young, 2021), non‐facial body movements (e.g., tightening or raising of fists, crossed arms, slumped shoulders; Aviezer et al., 2008; Witkower, Hill, Koster, & Tracy, 2021), behavioral action tendencies (e.g., approach, withdrawal; Adams, Ambady, Macrae, & Kleck, 2006), emotion language (words and phrases that denote affective states; Hoemann, Xu, & Barrett, 2019; Lakoff, 2016; Ruba, Meltzoff, & Repacholi, 2020b)—and the context (i.e., the broader social situation) in which any of the aforementioned actions occur (Ruba & Pollak, 2020; K. E. Smith, Leitzke, & Pollak, 2020). Starting from relatively simple and perceptually available components of this input (e.g., a sharp or gentle voice, an open or closed mouth), infants must arrive at abstract components of emotion (e.g., subjective, internal states and predictions about others’ actions). As with research on language, there are still ongoing debates regarding potential innate specifications of emotion, and whether certain emotions are universally experienced and signaled (Barrett et al., 2019; Cowen, Sauter, Tracy, & Keltner, 2019). Nevertheless, there is agreement that displays of emotion vary across cultures and subcultures, and infants must learn complex rules depending on the idiosyncrasies of their social and cultural group (e.g., whether to mask/suppress a negative emotional response; Crivelli, Russell, Jarillo, & Fernández‐Dols, 2016; Elfenbein & Ambady, 2002; Matsumoto, 1993). Our account is not limited to any particular type of emotion. Rather, our concern is how children learn to organize their feelings into whatever meaningful categories of emotion exist in their social world to make sense of their experiences and facilitate communication. Contrary to popular belief, emotion categories such as sadness, happiness, anger, fear, disgust, and surprise are not “basic” emotions that are highly similar across humans. These categories are merely those that English‐speaking scientists have defined and studied the most frequently. A child exposed to English might use a label of sadness to organize an experience of loss, a speaker of Tagalog might use gigil to make sense of their irresistible urge to pinch or squeeze something cute (Lomas, 2016), and the Ilongot might categorize their feeling of an exuberant surge or burst of energy as liget (Wierzbicka, 1992). Rather than focusing on any subset of emotion linguistic categories, our aim is to understand how children develop the capacity to infer emotional meaning from observable cues in their environments. Like language input, emotional input is characterized by vast richness. Other people's feelings cannot be directly observed, and no single muscle movement, behavior, or utterance consistently indicates a specific emotional state (Barrett et al., 2019). We can smile when we feel happy, but also when we feel nervous, embarrassed, manipulative, proud, satiated, superior, pained, subservient, irritated, or when we are attempting to mask another emotion (Martin, Rychlowska, Wood, & Niedenthal, 2017; Seyfarth & Cheney, 2003). Similarly, when we feel happy, we may smile, cry “tears of joy,” or try to hide our emotion, particularly if it comes after another person's misfortune. Given the indeterminate nature of these emotion cues, they always need to be interpreted within the larger context in which they are expressed–smiling at a birthday party is quite different than smiling when being scolded. Emotions are also probabilistic even within a given context: puppies usually spark joy, but perhaps not if one has recently lost their pet or was bitten by a dog. Infants also need to appreciate that cues sometimes used to convey emotional states can also occur for other physical sensations or cognitive states. Crossed arms may indicate that someone feels physically cold, while a furrowed brow may reflect that someone is intensely focused. Moreover, the cause of an emotion could be an observable event in the environment, or an emotion may originate from thoughts or memories, with no observable referent whatsoever. Emotional input is thus characterized by both referential ambiguity and variability. At the same time, emotions reflect some systematicity; they are not random occurrences (although the degree of this systematicity remains hotly debated, see Barrett, 2021; Cowen et al., 2021; Le Mau et al., 2021). Also similar to language, the processes through which infants learn to understand emotion cues are largely unsupervised (Ruba & Repacholi, 2020b). Parents infrequently label emotions in their interactions with children. One content analysis found that, in conversations with 2‐ to 5‐year‐old children, parents frequently use terms for general attitudes (“like”, “love”), but rarely use labels for specific emotions like “happy” (Lagattuta & Wellman, 2002). There is little evidence that caregivers engage in explicit instruction about emotions: “I am smiling because I feel happy” or “I feel happy because it's my birthday” (Brownell, Svetlova, Anderson, Nichols, & Drummond, 2013). Even words that are used as labels for emotions are not clearly signaled by observable information (e.g., it is not readily apparent how “love,” “pride,” “shame,” or “hurt feelings” map onto the visible world). Similarly, a child may hear “leave me alone” or “go away” from a sibling, without reference to any emotion category (e.g., does such a statement reflect that the other person is busy, sad, angry, hungry, tired, or something else?). Still, long before they can produce emotion labels, infants attend to and use cues about others’ feelings to guide their own behavior (for recent reviews, see Ruba & Pollak, 2020; Ruba & Repacholi, 2020a).

Comparing language and emotion input

As highlighted above, the problems facing language and emotion learners are similar in their overall structure. Infants must sort through myriad relevant and irrelevant cues to detect meaningful patterns, with virtually no supervised instruction. Yet, the input in these two domains differs in important ways. On the most surface level, language input is temporal (in the case of spoken languages) or both temporal and spatial (sign languages use visual space in a structured way), whereas emotion input is both temporal and spatial. Language input is either largely auditory (as in spoken languages, with additional information provided by the visual world, including speech cues on the face) or entirely visual (as in sign languages), while emotion input is communicated through an ever‐changing mix of sensory modalities. These input differences influence the learning problems that infants face. Aspects of language and emotion input also differ with respect to their abstractness, and, as a result, their predictability. For instance, there are agreed‐upon “correct” labels for referents within languages, such as what should be labeled a “dog” versus a “cat.” Many object labels are reliably used to refer to aspects of the environment. An adult native‐English speaker is unlikely to label a dog as a “cat,” and when they do, children lose trust in the speaker (for a recent review, see Tong, Wang, & Danovitch, 2020). In contrast, while there are cultural norms regarding the appropriateness of certain emotions (e.g., laughing at another person's pain is unkind), there is not a single “correct” way for emotions to be felt and expressed, and there is not a consistent pairing between any discrete event and any specific emotion or set of emotions. In this way, the co‐occurrences of expressive behaviors and internal affective states may be less predictable than the co‐occurrences of labels and referents. Some aspects of language are similarly abstract (e.g., the meaning of “the” or “truth”) or inconsistent (e.g., what is described as “yummy” may vary across individuals). Yet, they are nevertheless predictable once the child has learned about them (e.g., “the” precedes nouns in English, providing a clue to the syntactic category of “truth” even without knowing its meaning). Despite the complexity of the input in each of these two domains, most infants learn to understand aspects of others’ linguistic and emotion cues during the first 2 years of postnatal life. In the next section, we consider how infants learn these rich communicative systems.

Learning complex communicative systems

Traditionally, theories about language and emotion development emphasize either (a) evolutionarily preserved, universal aspects of each domain, or (b) functional and environmental adaptations of each domain—and, to be clear, there is compelling evidence for both. According to the former, nativist, perspectives, the human brain possesses rudimentary features or biases that direct infant learning to salient aspects of the environment. Nativist perspectives offer an account of similarities across individuals and cultures, and a plausible explanation for how infants understand language and emotion so early in development. In contrast, empiricist views hold that, prior to sensory learning, the human brain does not include content, biases, or packages of skills. Instead, empiricist approaches emphasize the role of sensory experience as the basis of knowledge. This approach accounts for variability across individuals and similarities across domains of cognition. Each of these perspectives assumes a different initial state of knowledge about language and emotion in the human brain. These questions about the initial state of the human brain are often confounded with issues about how knowledge is acquired or transformed. One way in which developmental change might occur is that specialized capacities support infant learning of specific skills or information. This view, domain‐specificity, holds that many aspects of cognition, including language and emotion, are supported by distinct, evolutionarily‐specified learning processes or biases that guide acquisition. These processes are tailored to, and prepared for, specific input in a particular domain. In contrast, domain‐general theories argue that a common set of computational principles drive learning across multiple domains. In this view, domains develop differently not because of their initial state, but because of differences in the sensory or structural properties of their inputs. Domain specificity is typically associated with nativism, whereas domain generality is often paired with empiricist approaches. But these types of theories need not be paired. In principle, the nativist/empiricist dimension is orthogonal from the domain‐specific/domain‐general dimension. Below, we highlight how a nativist, domain‐general approach can be applied to language and emotion learning.

Statistical learning in human infants

Humans are born with an array of perceptual and cognitive abilities used to interact with and learn from the world. One such ability is sensitivity to statistical regularities in the environment. Infants can track a range of statistical regularities, including exemplar frequency, forwards and backward transitional probabilities, non‐adjacent co‐occurrences, and category‐level patterns (for recent reviews, see Frost, Armstrong, & Christiansen, 2019; Saffran & Kirkham, 2018). This sensitivity to statistical regularities does not appear to be acquired via experience: it is evident days after birth (Bulf, Johnson, & Valenza, 2011; Fló et al., 2019; Teinonen, Fellman, Näätänen, Alku, & Huotilainen, 2009) and in many non‐human animal species (Boros et al., 2021; Santolin & Saffran, 2018; Wilson et al., 2020). While infants detect statistical regularities in the auditory and visual modalities, infants are generally more proficient at tracking auditory sequences than visual sequences (Emberson, Misyak, Schwade, Christiansen, & Goldstein, 2019; Krogh, Vlach, & Johnson, 2013)—likely because the auditory world is fleeting and highly sequential, while many aspects of the visual environment are more stable (Conway & Christiansen, 2009; Saffran, 2002). Infants can also track cross‐modal correlations, such as those between labels and objects (L. B. Smith & Yu, 2008; Vouloumanos & Werker, 2009) or between facial movements and vocalizations (Grossmann, Striano, & Friederici, 2006; Kahana‐Kalman & Walker‐Andrews, 2001; Vaillant‐Molina, Bahrick, & Flom, 2013). This sensitivity to regularities in the environment, including distributions, probabilities of co‐occurrence, and correlations, is known as statistical learning. Below, we suggest that statistical learning approaches can provide explanations for how infants learn to comprehend key aspects of language and emotion. Statistical learning abilities appear to be domain‐general, in that their computational underpinnings are not designed for a specific domain (i.e., language or emotion), but are available for learning information drawn from numerous domains. Despite being domain‐general, our statistical learning approach is also nativist, in the sense that learning is constrained by genetically endowed computational abilities and constraints on perception and processing. This perspective is also inherently developmental, in that infants’ learning is constrained by emerging cognitive abilities (e.g., attention, working memory), prior sensory experiences, and learning in other domains (e.g., social cognition). Below, we briefly overview evidence concerning infant statistical learning of language and emotion, focusing on how and why infants may track statistics across each of these domains.

Statistical regularities in linguistic and emotional input

For statistical learning to occur, language and emotional input must contain learnable regularities. Early statistical learning studies focused on word segmentation: how infants detect where words begin and end in the absence of clear perceptual boundaries, like pauses (Aslin, Saffran, & Newport, 1998; Goodsitt, Kuhl, & Morgan, 1993; Saffran, Aslin, & Newport, 1996). This body of research suggests that infants are sensitive to probabilities of syllable co‐occurrence—important cues to word boundaries—in both simplified artificial languages and natural speech (for recent theoretical reviews, see Erickson & Thiessen, 2015; Saffran, 2020). Infants are also sensitive to statistics at different levels of analysis, including distributional statistics for phoneme categories (Maye et al., 2002), cross‐situational statistics for label‐referent pairs (L. B. Smith & Yu, 2008; Vouloumanos & Werker, 2009), and category‐based regularities in grammatical structures (Gomez & Gerken, 1999; Lany & Saffran, 2010; Saffran et al., 2008). In each case, infants’ statistical learning abilities appear well‐suited to linguistic input, at least in simplified laboratory tasks. Indeed, it is plausible that linguistic input itself has been shaped by human statistical learning abilities: only learnable structures should persist in the languages of the world (Christiansen & Chater, 2008; Saffran, 2001). Emotion learning has not traditionally been framed in terms of statistical learning, but more recent accounts are beginning to examine the role of regularities in children's emotional environments (Doan, Friedman, & Denison, 2018; Plate, Wood, Woodard, & Pollak, 2019; Woodard et al., 2021). Similar to sequences of syllables, infants are sensitive to transitional probabilities in artificial sequences of facial configurations (Mermier, Quadrelli, Turati, & Bulf, 2022). Infants are also sensitive to category‐based regularities in expressive behaviors. As an example, they perceive a group of individuals who are smiling to belong to the same category, in contrast to individuals who are not smiling (Ruba & Repacholi, 2020a). Infants also expect that laughter will co‐occur with smiles, and that a person will smile rather than frown after receiving a gift (Grossmann et al., 2006; Kahana‐Kalman & Walker‐Andrews, 2001; Ruba, Meltzoff, & Repacholi, 2019; Vaillant‐Molina et al., 2013). Infants use these regularities to make predictions about other people's behavior, expecting that someone who appears to be angry will continue to display anger in new social contexts (Doan, Friedman, & Denison, 2020; Plate et al., 2019; Repacholi, Meltzoff, Hennings, & Ruba, 2016; Repacholi, Meltzoff, Toub, & Ruba, 2016). Taken together, these data indicate that infants and young children are sensitive to various statistical regularities in emotion input.

Primitives and constraints for statistical learning

There are an infinite number of statistics that infants could compute over the input that they experience in their environments. How do infants determine which regularities to track? One important factor is that infants track statistics over a limited set of perceptual primitives, which necessarily vary across domains. In the domain of language, infants learning spoken language initially track information about distributions of phonemes or syllables, whereas infants learning sign languages track information about the distribution of handshapes and movement trajectories. Learners acquiring languages in both modalities attend to regularities in the face, though the relevant information differs—the mouth provides speech cues for spoken languages (including information about rhythm and phonemic content), whereas sign languages use a wide range of facial cues including both the eyes and mouth. The primitives over which infants compute language statistics are also affected by prior learning. As novice learners, infants are confronted with a stream of sounds (or signs). Before they can learn word meanings, they must first figure out where words begin and end (a complex endeavor given that there are no reliable acoustic markers of word boundaries in fluent speech). Similarly, infants cannot track relationships among lexical categories (like nouns and verbs) until they have figured out which words belong to each category. Languages also contain statistical distributions relevant for some learners but not others. As previously mentioned, lexical tones (i.e., pitches and pitch contours associated with words) are crucial for acquiring vocabulary in tonal languages like Mandarin or Hmong, but not for languages like English, where the tone is uncorrelated with word meanings. Thus, the primitives that are used for language learning change dynamically as learning unfolds. What infants have already learned alters what becomes available to learn in the future. Far less research has examined the primitives for emotion learning. However, extant research provides clues as to the information that may be most relevant to infant learners. For instance, infants may initially track information about transitional probabilities and co‐occurrences in expressive behaviors across modalities, particularly between faces and voices (newborns have attentional biases for faces and primate vocalizations; Johnson, Dziurawiec, Ellis, & Morton, 1991; Mermier et al., 2022; Morton & Johnson, 1991; Shackman & Pollak, 2005; Vouloumanos, Hauser, Werker, & Martin, 2010). Infants may also track the situational contexts in which these expressive behaviors occur, such as those contexts associated with alleviation of distress (Moriceau & Sullivan, 2006). As with language, the primitives over which infants compute statistics are likely impacted by prior learning. Around 7 months of age, infants preferentially attend to (and perhaps track statistics for) negative expressive behaviors (for a review, see Vaish, Grossmann, & Woodward, 2008). These changing attentional biases may reflect an increase in caregivers’ negative expressive behaviors, coinciding with the onset of infant self‐produced locomotion (Campos et al., 2000). Additionally, emotions likely contain statistical distributions relevant for some learners but not others. Infants with depressed or maltreating caregivers may primarily observe negative expressive behaviors (Plate et al., 2019), and thus, may preferentially (and adaptively) track these behaviors across different situations. As with language, the primitives used for emotion learning are likely dynamic and changing across the first few years of life alongside other aspects of cognitive, motor, and social development (Herzberg, Fletcher, Schatz, Adolph, & Tamis‐LeMonda, 2021; Hoemann et al., 2020; Ruba & Pollak, 2020). Another key constraint on statistical learning lies in the computations themselves. Infants do not track every statistical regularity available to them, and many questions remain about the limits of statistical learning (for additional discussion, see Saffran & Kirkham, 2018). One hint comes from research on structures in language and emotion input. For example, infants better learn auditory units that contain predictive patterns in both linguistic and nonlinguistic materials (Saffran et al., 2008; Santolin & Saffran, 2019). The structures that infants find more learnable are precisely those that tend to occur in the languages of the world. It is plausible that similar connections occur in emotion learning. For instance, caregivers may naturally express their emotions in ways that maximize learning possibilities (e.g., “emotionese”; Benders, 2013; Ruba & Repacholi, 2020b; Schachner & Hannon, 2011; Trainor, Austin, & Desjardins, 2000; Wu, Schulz, Frank, & Gweon, 2021). This includes displaying emotions in congruent, multimodal ways across predictable situations—typically smiling and laughing during playtime, frowning and raising their voice when mediating a tantrum, or exclaiming “ew” with a scrunched nose when changing a diaper. Like language, these predictive patterns may help infants learn about when and how other people tend to display their emotions. Finally, statistical learning is constrained by the bodies and brains in which it occurs—infants are limited in the types of information they perceive and in the ways that they can act on the world. For instance, newborns’ relatively poor visual acuity and motor abilities bias their attention to proximally close objects, such as faces. The faces that dominate infants’ early visual environments are persistent in the visual field and tend to be of fewer than three individuals (Jayaraman & Smith, 2019; Jayaraman, Fausey, & Smith, 2017). With these early, frequent, and close‐up experiences with faces, infants may preferentially attend to facial movements that provide information about their caregivers’ emotional state and the language environment (e.g., rhythm regularities). With advances in visual and motor development, older infants turn their attention to hands and objects (Fausey, Jayaraman, & Smith, 2016). This provides new visual access to labeled referents in the environments as well as the situational contexts in which emotionally expressive behaviors occur. Memory development also constrains statistical learning. The limited working abilities of infants may, paradoxically, help learning in domains where input is noisy. In noisy environments, infants can hone in on generalizations rather than being sidetracked by specific exemplars (Newport, 1990). Together, these examples illustrate how various developmental processes work to constrain and focus learning (Ruba & Pollak, 2020).

Why do infants track statistical regularities?

Language and emotion are evolutionarily advantageous communicative systems. These systems not only allow infants to learn about their environment but also to share their desires, goals, and internal states with other humans. Infants are likely intrinsically motivated to learn about language and emotion to communicate and navigate their social worlds. However, it is an open question whether the drive to communicate is a key factor very early in development: as noted previously, even neonates track statistical regularities (Bulf et al., 2011; Fló et al., 2019; Teinonen et al., 2009). Infants may also track statistics because statistical regularities facilitate predictive processing (Köster, Kayhan, Langeloh, & Hoehl, 2020; Romberg & Saffran, 2013; Saffran, 2020). That is, rather than possessing isolated “statistical learning mechanisms,” infants may exploit statistical regularities to detect and learn from prediction errors (Elman, 1990), to generate faster and more accurate predictions about upcoming input, and as part of the process of organizing information in memory (e.g., Thiessen, 2017). Predictive information is highly valuable for encoding and processing both language and emotion. Real‐time language processing is extremely challenging, given the rapidity with which linguistic signals unfold. Statistical regularities in words and word combinations may speed up infants’ ability to process this information, map it to meanings, and deal with challenging listening conditions (Graf Estes, Evans, Alibali, & Saffran, 2007; Lany, Shoaib, Thompson, & Estes, 2018; McMillan & Saffran, 2016). Relatedly, words presented in predictable contexts may be easier to learn than those presented in less predictable contexts (Benitez & Saffran, 2018; Benitez & Smith, 2012; Eiteljoerge, Adam, Elsner, & Mani, 2019). Real‐time emotion processing is also challenging, given the many manners and situations in which emotions are expressed. Infants may use statistical regularities in a person's emotional behavior to predict how that person will behave in novel situations (Repacholi, Meltzoff, Hennings, et al., 2016; Repacholi, Meltzoff, Toub, et al., 2016). Finally, statistical information may help infants direct their learning and attentional resources toward the most informative data in their environment. In other words, tracking statistics can help infants reduce uncertainty. During word learning, children and adults preferentially sample items about which they are more uncertain (Zettersten & Saffran, 2021). In the affective domain, infants often engage in social referencing (i.e., seeking out emotional cues from an adult) during novel and ambiguous situations (Kim & Kwak, 2011; Sorce, Emde, Campos, & Klinnert, 1985). Thus, rather than being passive “sponges” soaking up regularities, infants are active learners, using statistics to inform the data that they sample. Consideration of the infant as an active learner may explain how infants learn so much so quickly (Mani & Ackermann, 2018; Raz & Saxe, 2020). Infants may direct their prodigious learning abilities toward the sources that maximize information gain, making learning more efficient.

Outstanding questions

In this review, we have highlighted how domain‐general statistical learning abilities may explain how humans learn about key aspects of language and emotion. This approach provides potential explanations for differences in what and when infants learn about these domains. Our approach suggests that input that is structured, consistent, predictable, and engaging to infants should be learned more rapidly than input that is riddled with exceptions and noise, or that is not particularly interesting to young learners. In other words, differences in the input to learning contribute to differences in infants’ ability to learn from this input. This approach opens the door for exciting future research directions. First, researchers—particularly emotion researchers—need to precisely describe the structure of learning input. What range, frequency, and types of input do infants observe in their daily lives? What are the learnable statistical regularities in this input? From this information, researchers can begin to specify the primitives over which infants track statistical information. Are these primitives apparent from birth, or are they learned via exposure to language and emotion input? For instance, it is possible that infants initially attend to emotion information on faces since faces are evolutionary‐adaptive emotion signals (Ekman, 1994), or since faces are salient and persistent in newborns’ visual fields (L. B. Smith, Jayaraman, Clerkin, & Yu, 2018). Similarly, communicative sounds—even those from other species—seem to help young infants organize their experiences into categories that will eventually become words (for a recent review, see Perszyk & Waxman, 2018). It is also imperative to specify how infant learning is constrained by other developing abilities, such as memory, attention, and language. Comparing children at multiple ages will be critical to this work (for recent examples, see Raviv & Arnon, 2018; Ruba, Meltzoff, & Repacholi, 2020a) It will also be important to move beyond “classic” statistical learning tasks (i.e., a learning phase followed by a separate test phase). Statistical learning researchers have made great strides in developing novel, continuous measures of real‐time learning in adults and children (Arnon, 2020; Frost et al., 2019; Kidd et al., 2020). Infancy researchers are beginning to do so as well, with both neuroimaging methods (e.g., Functional magnetic resonance imaging [fMRI], Functional near‐infrared spectroscopy [fNIRS], Electroencephalography[EEG]; Choi, Batterink, Black, Paller, & Werker, 2020; Ellis et al., 2021; Emberson, Richards, & Aslin, 2015) and behavioral measures (e.g., pupillometry, anticipatory eye movement tasks) (Havron, de Carvalho, Fiévet, & Christophe, 2019; Reuter, Borovsky, & Lew‐Williams, 2019; Romberg & Saffran, 2013; Zhang & Emberson, 2020). By moving away from passive measures of learning, researchers can discover how infants actively solve learning problems. Further, many paradigms used to assess statistical learning use artificial stimuli (static faces, disembodied voices) and/or clearly non‐communicative stimuli like geometric shapes or computer alert sounds. Although infants learn in these paradigms, it seems unlikely that infants perceive this learning as advantageous for social communication. Future research can help to tease apart myriad motivations for learning, from uncertainty reduction to a drive to communicate. It will also be important to better understand the computational underpinnings of learning in these two domains. Other approaches, such as Bayesian cognitive models, have also been applied to both language learning and, more recently emotion learning (e.g., Wu et al., 2021; Xu & Tenenbaum, 2007). There are many important differences between, for example, Bayesian and connectionist models. However, researchers are converging on approaches to understanding development in which environmental input is treated as information that the child learns by generating predictions and updating those predictions based on errors. The goal of such learning is to reduce uncertainty. Perhaps the types of learning outlined in the current paper—starting from primitives available to infants very early in life—provide some of the priors that both constrain the acquisition of more complex aspects of language and emotion and serve as the basis for the development of increasingly complex behaviors. In sum, we have highlighted how a comparative developmental approach—intersecting language, emotion, and potentially other domains (e.g., social category development, multimodal perceptual development)—provides a useful lens through which to consider the learning problems faced by fledgling communicators. As researchers, we tend to focus on our own domain of expertise (e.g., language, emotion). However, considering how development unfolds through the lenses of multiple domains can generate useful insights that may not be obvious when focusing on a single domain (e.g., Hoemann et al., 2020; Maurer & Werker, 2014). By looking beyond a single domain and, ideally, by developing collaborative research programs, researchers may come closer to understanding how humans rapidly learn to understand and communicate with others.

104 in total

1. Phonological Knowledge Guides Two-year-olds' and Adults' Interpretation of Salient Pitch Contours in Word Learning.

Authors: Carolyn Quam; Daniel Swingley
Journal: J Mem Lang Date: 2010-02-01 Impact factor: 3.059

2. Experiential influences on multimodal perception of emotion.

Authors: Jessica E Shackman; Seth D Pollak
Journal: Child Dev Date: 2005 Sep-Oct

3. Faces in early visual environments are persistent not just frequent.

Authors: Swapnaa Jayaraman; Linda B Smith
Journal: Vision Res Date: 2018-06-20 Impact factor: 1.886

4. Infant sensitivity to distributional information can affect phonetic discrimination.

Authors: Jessica Maye; Janet F Werker; LouAnn Gerken
Journal: Cognition Date: 2002-01

5. Mapping the Passions: Toward a High-Dimensional Taxonomy of Emotional Experience and Expression.

Authors: Alan Cowen; Disa Sauter; Jessica L Tracy; Dacher Keltner
Journal: Psychol Sci Public Interest Date: 2019-07

6. Infant exuberant object play at home: Immense amounts of time-distributed, variable practice.

Authors: Orit Herzberg; Katelyn K Fletcher; Jacob L Schatz; Karen E Adolph; Catherine S Tamis-LeMonda
Journal: Child Dev Date: 2021-09-13

7. Infant statistical-learning ability is related to real-time language processing.

Authors: Jill Lany; Amber Shoaib; Abbie Thompson; Katharine Graf Estes
Journal: J Child Lang Date: 2017-07-19

8. Differences in early parent-child conversations about negative versus positive emotions: implications for the development of psychological understanding.

Authors: Kristin Hansen Lagattuta; Henry M Wellman
Journal: Dev Psychol Date: 2002-07

Review 9. Developing an Understanding of Emotion Categories: Lessons from Objects.

Authors: Katie Hoemann; Rachel Wu; Vanessa LoBue; Lisa M Oakes; Fei Xu; Lisa Feldman Barrett
Journal: Trends Cogn Sci Date: 2019-11-29 Impact factor: 20.229

10. Consistency of co-occurring actions influences young children's word learning.

Authors: Sarah F V Eiteljoerge; Maurits Adam; Birgit Elsner; Nivedita Mani
Journal: R Soc Open Sci Date: 2019-08-07 Impact factor: 2.963