Literature DB >> 35476636

How do children overcome their pragmatic performance problems in the true belief task? The role of advanced pragmatics and higher-order theory of mind.

Lydia Paulin Schidelko¹, Marina Proft¹, Hannes Rakoczy¹.

Abstract

The true belief (TB) control condition of the classical location-change task asks children to ascribe a veridical belief to an agent to predict her action (analog to the false belief (FB) condition to test Theory of Mind (ToM) abilities). Studies that administered TB tasks to a broad age range of children yielded surprising findings of a U-shaped performance curve in this seemingly trivial task. Children before age four perform competently in the TB condition. Children who begin to solve the FB condition at age four, however, fail the TB condition and only from around age 10, children succeed again. New evidence suggests that the decline in performance around age four reflects pragmatic confusions caused by the triviality of the task rather than real competence deficits in ToM. Based on these results, it can be hypothesized that the recovery of performance at the end of the U-shaped curve reflects underlying developments in children's growing pragmatic awareness. The aim of the current set of studies, therefore, was to test whether the developmental change at the end of the U-shaped performance curve can be explained by changes in children's pragmatic understanding and by more general underlying developmental changes in recursive ToM or recursive thinking in general. Results from Study 1 (N = 81, 6-10 years) suggest that children's recursive ToM, but not their advanced pragmatic understanding or general recursive thinking abilities predict their TB performance. However, this relationship could not be replicated in Study 2 (N = 87, 6-10 years) and Study 3 (N = 64, 6-10 years) in which neither recursive ToM nor advanced pragmatic understanding or recursive thinking explained children's performance in the TB task. The studies therefore remain inconclusive regarding explanations for the end of the U-shaped performance curve. Future research needs to investigate potential pragmatic and general cognitive foundations of this developmental change more thoroughly.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35476636 PMCID： PMC9045612 DOI： 10.1371/journal.pone.0266959

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

Introduction

Theory of Mind (ToM) is the social-cognitive ability to think and reason about one’s own and others’ mental states [1]. At the core of ToM lies meta-representation: the capacity to represent that subjects represent the world in a certain way that can differ from one’s own current perspective. The litmus test for ToM is the so-called false belief (FB) task. In this task, participants need to track a story protagonist’s belief that comes to differ from reality: Participants see how an object is placed at location 1 in the presence of a protagonist and is then moved from location 1 to a new location 2 in the protagonist’s absence. Participants are then asked to ascribe to the protagonist a belief about the object’s location (“Where does she think the object is?”) or to predict the protagonist’s behavior based on her belief (“Where will she look for her object upon her return?) [2]. Individuals with an (explicit) ToM predict that the protagonist will look for her object–based on her false belief–in location 1. Developmentally, children succeed FB tasks around the age of four years. Children younger than four years systematically fail. They predict confidently that the protagonist will look–according to reality–in location 2 [2, 3]. In the history of ToM research, many FB studies also administered an additional True Belief (TB) condition to the children. The TB condition serves as a baseline and control measure to ensure that children, especially younger ones, can cope with the narrative task structure. It is structurally similar to the FB task with the only difference that the protagonist witnesses the object transfer from location 1 to location 2 and thus has a veridical belief about the object’s location. Typically, children younger than four years who fail the FB condition, pass the TB condition [3]. Recently, however, the TB condition has been administered to a broader age range of children, with quite puzzling results: with age, children get worse in the TB condition. More specifically, children from age four who begin to solve the FB task start to fail the TB task [4-7]. The initial pattern found in younger children–passing the TB condition while failing the FB condition–reverses around this age. Children from age four succeed in the FB condition but fail the TB condition. This performance pattern (younger children pass TB and fail FB and vice versa for older children), reveals itself also at an individual level, in strong negative correlations of the two versions of the tasks. Remarkably, the failure in the TB task persists into late childhood: Only from age seven to ten, performance in the TB task recovers. At this point in development, children pass both FB and TB for the first time. Taken as a whole, the development of performance in the TB task follows a U-shaped trajectory: from high performance in young children around age three to a dramatic decrease around age four when children solve the FB condition to a recovery of performance only in later childhood around age seven to ten. FB performance, in sharp contrast, remains constantly high from around age four [6]. As any U-shaped curve in performance, this unexpected developmental pattern raises at least two fundamental questions: First, how does the decrease in performance in the TB task come about at the beginning of the U-shaped curve? Why do children start to fail TB tasks once they come to master FB tasks? Second, how does the recovery of performance come about at the end of the U-shaped curve? Why do children from around age seven to ten overcome their intermediate difficulty with TB tasks? One possible answer to both questions is the following: The developmental pattern of the U-shaped performance curve in the TB task does not reflect children’s ToM competencies but the development of an understanding of pragmatics. Children’s failure in this intermediate state between four and ten is not based on a fundamental problem in ToM understanding but on pragmatic performance limitations [8]. In general, pragmatics pertains to comprehension and production of speech acts and discourse that goes beyond mere literal meaning. For a comprehensive understanding of most speech acts and discourse, additional information besides the literal meanings of the words (sentence meaning) needs to be taken into account in order to determine what is meant (speaker meaning): for example, information about who made an utterance, in which context, against the background of which rules etc.–and in particular, the recipient needs to figure out the speaker’s intentions underlying the speech act in question [e.g., 9]. Pragmatics is thus, in some sense, a form of applied ToM. Regarding the TB performance, it seems quite clear that children from early on do understand the semantics, the literal meaning of the words in the TB test question. Perhaps, however, children in the intermediate state (between ages four and ten) struggle with understanding the use of the test question “Where does the protagonist think the object is?” They do not yet understand what the interlocutor means or wants by asking this question. But why should the TB question be pragmatically challenging? Why do children struggle to grasp the experimenter’s intention in asking the TB test question? A closer inspection of the TB task and the corresponding question reveals that it combines a number of properties that jointly make the task quite peculiar from a pragmatic point of view. First, the TB task is highly trivial: The protagonists clearly sees that the object is moved to a new location and the protagonist, the child and the experimenter share this knowledge about where the object is and everyone knows that the others know, too (this is thus common ground or mutual knowledge). Second, the TB test question is an academic question. The experimenter knows the answer herself. She does not ask this question to gain new information but rather to test whether the child knows the answer, too. Academic question formats are difficult to grasp for young children [10] (for effects of the interviewer’s knowledge on children’s answer behavior, see also [11-13], for related proposals regarding the role of pragmatic factors in FB and other ToM tasks, see, [14-17]. Third, the TB test question asks for a belief ascription or a belief-based action prediction [6]. Normally, we tend to talk about beliefs when we refer to or at least raise the possibility of their falsity [18]. In the TB scenario, however, there is no such obvious possibility. From a purely semantic point of view, the TB test question is utterly unproblematic, indeed highly trivial. Children with a merely literal language use with little sense of pragmatics should thus not find such questions taxing. But for language users with some pragmatic sensitivity, such questions should appear at least prima facie odd. In light of the first question raised above (How does the decrease in performance in the TB task come about at the beginning of the U-shaped curve?), the pragmatic analysis thus yields the following (somewhat simplified) picture: Young children without a sophisticated understanding for ToM are limited in their pragmatic language understanding. They mostly use and interpret language literally (but see [19]). Children in this stage of development thus should have no problems with the TB task. However, once children start to develop ToM capacities (i.e., when they pass the FB task), these lay the ground for developing pragmatic understanding [20, 21]. However, their initial ToM and pragmatic understanding are still fragile at this age and their fragile pragmatics then leads them astray in the TB test. Some evidence speaks in favor of this. First, as reviewed above, performances in the TB and FB tasks are negatively correlated such that children first fail FB and pass TB, and then show the reverse pattern [6]. The performance pattern matches the predictions of the pragmatic analysis such that children’s failure in the TB task depends on their success in the FB task: once children develop the prerequisite ToM capacities, they develop an understanding for pragmatics. As a consequence of this development, they suffer from the pragmatic peculiarity of the TB question and fail to answer it correctly while passing the FB task [6]. Second, children succeed in both the TB and FB task after modifications of the task pragmatics in the TB task. For example, children solve the TB task without any decline in the performance curve when the task is presented without or with less trivial language [8]. These results suggest that children show no more difficulties with answering the TB test question correctly once peculiar factors of the tasks are removed. Taken together, first evidence confirms the predictions of the pragmatic analysis at the beginning of the U-shaped performance curve. The developmental processes at the end of the U-shaped performance curve, however, still raise open questions. How do children overcome their pragmatic difficulties in the TB task later in development, and how does this explain performance recovery at the end of the U-shaped curve? Currently, hardly anything is known about how children overcome their performance limitation in the TB task. From a pragmatic point of view, one possibility is the following: Regarding the beginning of the U-shaped development, the pragmatic analysis predicts that children start to get confused by the peculiarity of the TB test question once they develop sensitivity for pragmatics on the basis of their developing ToM. Applying the same logic to the end of the U-shaped curve, the pragmatic analysis predicts that children succeed again in the TB task once they have undergone further pragmatic development (adults, after all, even though they may find this type of questions funny, have no difficulty in answering it correctly). Taking new steps in pragmatics development might enable children to reason about language use on a higher, more sophisticated level of pragmatic interpretation, and thus to grasp more complex and advanced forms of discourse and speech acts. Imagine, to illustrate the point, someone asks you, “Did you enjoy the nice weather yesterday?” when it in fact was raining cats and dogs all day. Interpreting this as a regular question asked in order to receive new information would be highly confusing given that the presupposition (“The weather was nice yesterday”) is not fulfilled. To understand the actual speaker meaning, you need to stand back from the literal meaning of the question and interpret it at a different, higher level. For example, you might infer the speaker’s intention to make an ironic comment about the rainy weather, or you may interpret the question as an academic exercise such that the speaker’s intention is to test your knowledge of English past tense. What these examples thus illustrate is that a prima facie odd question can make perfect sense when interpreted on a new and higher pragmatic level [22]. Applied to the TB performance, this developmental step might enable children to focus on a new interpretational level that allows them to resolve their initial confusions about the peculiar test question in the TB task (“strange question, but then I’m participating in a study after all; researchers do ask strange questions…”). If this idea holds, children who are able to reason at a flexible, higher-order level of pragmatic interpretation should be more likely to answer the TB test question correctly. Accordingly, the emergence of flexible, higher-order pragmatic understanding would determine the end of the U-shaped performance curve. But what exactly happens in children’s pragmatic development at the end of the U-shaped curve? How can the crucial pragmatic development at the end of the U-shaped curve be described, and what are important foundations and correlates of this development? The pragmatic analysis predicts a developmental progress in advanced pragmatics at the end of the U-shaped curve. If indeed the developmental curve reflects pragmatic progress more generally, this should also become apparent in other areas of advanced pragmatics. To this end, we will compare children’s performance in the TB task at the end of the U-shaped curve with developments in advanced pragmatic understanding more generally (regarding comprehension of indirect speech acts). The progress in advanced pragmatics, in turn, could itself be rooted in growing recursive ToM capacities. And the development in higher-order recursive ToM, in turn, might be based on a more general ability for recursive thinking. In the following, we discuss these three possibilities in turn.

Advanced pragmatic understanding

The crucial step in pragmatic development that leads to the end of the U-shaped curve may reflect a more general phenomenon of developmental progress in pragmatics. Such a general progress in advanced pragmatics might become evident when comparing the performance in the TB task and advanced pragmatics in other areas. Generally, pragmatics is neither a simple and unitary phenomenon [23] nor do pragmatic abilities emerge in simple uniform ways across ontogenetic development [24]. Most relevant for present purposes are advanced forms of pragmatic understanding that tend to emerge comparatively late in development. Non-literal language understanding, such as ironic and metaphorical utterances, are prototypical examples [21, 25, 26]. For example, in uttering "It’s great weather for our picnic today" when it is raining all day, the speaker does not want the hearer to take her utterance literally. The speaker rather intends the hearer to belief that she thinks that it is not nice to have a picnic outside [27, p.262]. Accordingly, the hearer needs to suppress the initial, literal interpretation to be able to infer the actual meaning, for example, by taking context information into account [24]. Similar processes may be at work in the TB task such that children might overcome their confusion about the trivial test question (“This is too easy, maybe I missed something”) by suppressing this initial interpretation and taking context information into account (“I’m participating in a study and the experimenter asks test questions”). Interpreting non-literal language, especially ironic utterances, involves ascriptions of complex intentions and is therefore suggested to be an application of ToM [20, 26]. In neurotypical development, children develop an understanding of metaphors during school age or even preschool age [19, 24, 28]. Its relation to ToM abilities, however, has been controversially discussed in the recent literature [28-30]. Evidence on irony comprehension suggests that children develop an understanding of ironic utterances during school age between six and 13 years of age, depending on the kinds of measures used [24]. The relation of irony understanding to ToM is less controversial: children’s performance in irony comprehension correlates with their second-order FB understanding [26, 31].

Recursive ToM

The crucial foundation of development regarding the end of the U-shaped curve might be even more general than sophisticated pragmatics–for example, the capacity for recursive, higher-order mindreading. The standard ToM tests asks for a first-order mental state ascription, but of course this is not where mental state ascription ends. Advanced forms of ToM enable flexible and higher-order forms of mindreading in which additional levels of mental states can be represented. Ascription of a belief about a belief, for example, constitutes second-order mental state ascription. Virtually, mental states can be recursively embedded within each other infinitely (“A thinks that B thinks that C thinks that D thinks that … p”). This understanding for recursively embedded mental states may be the common denominator underlying both advanced pragmatics generally and of the TB performance specifically. It may provide the basis for advanced pragmatics in various forms, for example, by enabling the ascription of higher-order communicative intentions. In the TB task, recursive ToM might enable the ascription of specific higher-order communicative intentions to make sense of the speech act. There are at least two possibilities, how recursive ToM might be involved in the TB task in particular. One possibility is that children who develop an understanding for the pragmatic use of academic test questions (which are also very trivial and pertain to subjective representations) do not any longer get confused by the TB test question at all, but reach the right pragmatic interpretation of the question straight away. They integrate context information and ascribe higher-order intentions and thus make sense of the academic test question from the outset without being confused. A second possibility is that children initially suffer from confusion by the trivial academic TB test question (e.g., “Why does the experimenter ask me such a question? Maybe I must have missed something, maybe the experimenter thinks that I don’t know that the protagonists does not know that the object is in that location”). Younger children might not yet be able to resolve this confusion whereas older children, once they have developed recursive mindreading, might be able to ascribe higher-order mental states and thus to resolve the confusion (e.g., “She wants to know whether I know that the protagonist knows that the object is in the new location”). Either way, children’s performance on the TB test question should then be related to their recursive ToM capacities more generally. Recursive ToM reveals itself in various forms, in many different tasks and situations, but tasks that directly test for this understanding for second- or higher-order mental state ascriptions are rare in the literature. Evidence from this line of research shows that children at the age of five to six years can attribute second-order mental states [“A thinks that B believes that …”; e.g., 32], but only very little is known about children’s development of higher-order ToM beyond second order of recursion. In a study by Liddle and Nettle, for example, ten- to eleven-year-olds performed above chance level in a third-order ToM task and at chance level in a fourth-order ToM task [33]. Adults, in contrast, were able to reason until seventh-order of recursion, in particular, when tested “implicitly” through observing video clips of social interactions compared to explicit measures used in the above reported studies with children [34]. Until now, it therefore remains an open question when exactly children learn to reason about mental states on different levels of embedding and whether this development is fundamental for children’s performance in the TB task.

General ability for recursive thinking

The ability to reason about higher-order mental states, in turn, might be based on a more general ability for recursive thinking. The performance in the TB task, hence, would not only rely on the ability for higher-order ToM but on an even broader ability for recursive operations that is fundamental for higher-order ToM abilities. Thinking and reasoning recursively is not only of importance in the development of ToM but has been implicated in a number of probably uniquely human abilities such as language, music, mathematics or mental time travel [35-37]. Recursive operations in all these areas require embedding of elements (e.g., mental states, words/clauses, etc.) within elements of the same kind [38]. The corresponding level of reasoning can be more or less clearly defined and quantified (e.g., in the domain of ToM: “A thinks that B thinks that C thinks that p” as third-order mental state ascription). The general ability for recursive operations might manifest itself in recursive thinking in the various specific areas of application, including higher-order ToM. Consequently, the developmental changes in TB performance might reflect development in advanced pragmatic that builds on recursive ToM that, in turn, is a manifestation of general recursive operations. Once the child has acquired a certain level of general recursive thinking, this enables her–via recursive ToM–to think pragmatically about the TB task at a higher level, overcome her pragmatic confusion and solve the task. If this hypothesized pattern holds, an individual, first, would be able to think recursively on the same level of embedding in different areas of application; and second, the general level of recursive thinking would, at least partly, predict her performance in the TB task.

Rationale of the present study

In sum, the puzzling developmental pattern of the U-shaped performance curve in the TB task raises two fundamental questions: First, how does the decrease in performance in the TB task come about at the beginning of the U-shaped curve? Second, how does the recovery of performance come about at the end of the U-shaped curve? The pragmatic analysis presents one possible answer to both questions: the U-shaped curve reflects an underlying development in children’s understanding of pragmatics. Previous research has yielded some evidence that speaks to the first question, but so far, no study has empirically addressed the second question. The rationale of the present study, therefore, is to test whether, indeed, the developmental change at the end of the U-shaped performance curve reflects and can be explained by pragmatic development. To this end, we tested a wide age range of children with the standard FB and TB task. Due to the restrictions of the Covid-19 pandemic, testing was conducted in an online format. We ran the studies as moderated online studies in which the experimenter interacted with the child via video chat while presenting the tasks on screen. With this change in setting, the pragmatic context in which the experimenter administers the TB task to the child was essentially different compared to earlier studies. A preliminary question thus was whether the typical performance curve replicates in the new format in children between six and ten years. The age range is expected to include younger children who still fail and older children who succeed again in the TB task (e.g., [6]), representing the right half of the U-shaped performance curve. The main research question then was: What are the factors that explain the end of the U-shaped performance curve in the TB task? Here, we explore different possibilities based on the pragmatic analysis introduced above: Is the TB pattern a function of advanced pragmatics development? Even more generally: Is it a function of recursive ToM, or even of recursive thinking in general? In order to do so, Advanced Pragmatic Understanding was operationalized in a task that asked for children’s metaphor and irony understanding. Recursive ToM was operationalized in tasks that tested children’s understanding and production of higher-order mental state ascriptions. The general ability for recursive thinking was operationalized as children’s recursive language abilities as a proxy for their general recursive abilities. This task tested children’s understanding for embedded recursive clauses. Children were tested in these tasks and the TB task in three online studies. Table 1 displays the tasks administered in Study 1–3.

Table 1

Tasks included in Studies 1–3 in the order presented in the test session.

Task	Study 1	Study 2	Study 3
Recursion in general: recursive language abilities	x	x	x
Change-of-location: TB and FB	x	x	x
Recursive ToM: production		x
Recursive ToM: understanding	x	x	x
Advanced pragmatics understanding	x

Study 1

Method

This research was conducted in accordance with the Declaration of Helsinki and the Ethical Principles of the German Psychological Society (DGPs), the Association of German Professional Psychologists (BDP), and the American Psychological Association (APA). It involved no invasive or otherwise ethically problematic techniques and no deception (and therefore, according to National jurisdiction, did not require a separate vote by a local Institutional Review Board; see the regulations on freedom of research in the German Constitution (§ 5 (3)), and the German University Law (§ 22)). Before the test sessions of Studies 1–3 started, informed consent was obtained from the parents of the subjects.

Design

Children in all three studies were tested in a single session (30–45 minutes) by a female experimenter (E). The tasks were presented remotely (on a laptop computer screen or tablet computer screen, no smartphone) in an interactive online study via a video conferencing platform (mainly BigBlueButton, in case of connection issues, the test session was shifted to Zoom). The tasks were embedded in a video, which was displayed via the conference platform in the middle of the child’s screen and required the child to give verbal answers. Next to the video, the child was constantly able to see the webcam video of E and herself, so that the child and E were able to communicate via audio and video streaming during the whole test session (see [7] for a validation study of this paradigm). Before the beginning of the test session, the caretaker gave verbal consent to the child’s participation in the study and the video and audio recording during the test session. The verbal consent was recorded and stored separately from the recording of the test session. The caretaker and the child were informed that they might abort the participation at any given moment. In the beginning of the test session, E advised the child that she could repeat each question if the child had any comprehension difficulties.

Participants

Eighty-one 6- to 10-year-old children (72–131 months, mean age = 99.52 months; 41 girls, 40 boys) were included in the final sample. Eight additional children were tested but excluded from data analyses because of technical issues during the test session (N = 6), uncooperative behavior (N = 1) and concentration deficit resulting in >50% incorrectly answered control question in the change-of-location task (N = 1). The age range was chosen so broadly in order to compare children who show and do not show the performance difficulties in the TB task. Participants in this and all subsequent studies were recruited from a database of children whose parents had previously given consent to experimental participation as well as via social media.

Material

Test for syntactic recursion. The task adapted from Arslan and colleagues (2017) tested for the comprehension of embedded relative clauses in German language [39]. Children saw two rows of animals (upper and lower row) on the screen (Fig 1). Each animal was displayed on a different background color. The children were asked to name the location of a corresponding animal on the screen (e.g., “Where is the cow that strokes a horse?” for the first-order syntactic recursion test question). Children had to refer to the animal’s location in naming the corresponding background color and the row in which the animal was placed (e.g., “yellow, upper row”). The test questions containing the relative clauses were scaled from first order until fourth order of syntactic recursion and could be repeated up to four times [39]. For a detailed procedure, see S1 File.

Fig 1

Material for test question with a first-order syntactic recursion “Where is the cow that strokes a horse?”, Correct answer “yellow, upper row”.

Standard change-of-location task. The children received four trials of the standard change-of-location tasks with different stimuli [2] implemented in short, animated video clips. Protagonist A and her object O were presented to the child before Protagonist A placed O in one of two boxes (box 1). In her presence (TB condition) or absence (FB condition), protagonist B came into the scene and moved O to the other box (box 2) and the following test and control questions were asked: Test question: “Where does Protagonist A think that O is?” [correct answer box 1 for FB, box 2 for TB condition] Control Question 1: “In which box was O in the beginning?” [correct answer: box 1] Control Question 2: “Where is O now?” [correct answer: box 2] The TB and FB trials were presented in alternating order beginning with a TB trial and the children saw a frozen still image of the last frame of the scenario (Protagonist A and the two boxes) when answering the questions. Recursive ToM task: Understanding. The children heard three stories (partly adapted from [33]) accompanied by animated video clips and were asked to answer test questions about the characters’ mental states afterwards. The test questions were scaled from second order to fifth order of mental state recursion and children had to decide which of two sentences was true regarding the story line. The sentences were read out by a voice and displayed with pictures on the screen (Fig 2). To choose one of the two sentences, children could either name the side/color of the picture on the screen or repeat the sentence.

Fig 2

Screenshots of the visual animation of the fourth-order Theory of Mind test question in the story line “the video dilemma”.

Screenshots of the visual animation of the fourth-order Theory of Mind test question in the story line “the video dilemma”.

Note. Children heard a voice slowly reading out the answer sentences while the animation was presented accordingly on the screen. E.g., “Sarah hopes [picture a] that Olli believes [picture b] that she knows [picture c] that Mrs. Brown wants [picture d] them to watch a pirate film [picture e]”. After that the second answer option (red side) was read out and presented accordingly. Example story: the video dilemma (adapted from [33]) This is Sarah and this is Olli. Sarah and Olli are in the same class at school. “Hi, I’m Sarah!’, “and I’m Olli”. Their teacher is Mrs. Brown. Today Mrs. Brown suggests that Sarah and Olli should bring a video to school tomorrow to watch with the other children. Mrs. Brown also says to them, “Make sure you bring a film that I will like too!” (Mrs. Brown leaves the scene). Sarah’s favorite videos are pirate videos. Olli’s favorite videos are horse films. Which will it be? A pirates or a horse film? Olli says to Sarah, “We just can’t decide so I think that we should take the film that Mrs. Brown would like. Sarah, do you know which one Mrs. Brown would like best?” Sarah is thinking about that. She does not have a clue which film Mrs. Brown would like. But Sarah decides to tell Olli that she knows that Mrs. Brown likes pirate films best. Sarah thinks that this will make Olli agree to take a pirate video to school. Olli listens to this and then Olli says, “We will take a video of pirates then.” So, Sarah gets to enjoy her favorite film! Memory question Which sentence is true? a) Sarah likes pirates films best. b) Sarah likes horse films best. Test question (ToM Level 4) Which sentence is true? a) Sarah hopes that Olli believes that she knows that Mrs. Brown wants that they watch a pirate film.* b) Sarah hopes that Olli believes that she doesn’t know that Mrs. Brown wants them to watch a pirate film. *German translation with that-complement for want (“möchte, dass”) Task for advanced pragmatics understanding. Children received two trials of a pragmatic language task testing for their metaphor and irony understanding [partly adapted from and inspired by 20, 26, 30, 31]. Each trial consisted of a story about two characters accompanied by three pictures (Table 2 and example below).

Table 2

Example story for Advanced Pragmatics Understanding Task.

Picture	Content	Example
1	Information for metaphor	Lisa is running through the apartment all day. She plays with the ball, jumps on the sofa and plays tag with the cat.
1	Test Question about metaphor	What fits the best? Lisa is…• A cloud• A crocodile• A whirlwind (Correct Answer: Metaphor used in German for very active children)• A tree
2	Story line	In this moment, Lisa is running so fast that she hits the table and all the books fall on the ground.
3	Ironic utterance	Lisa’s older brother enters the rooms and says, “You’re very careful today.”
	Questions about ironic utterance	Test question 1: Does the brother want Lisa to believe that he thinks that she was careful?
		Test question 2: Why does the brother say, “You’re very careful today”?
		Control question: Does the brother find that Lisa was careful?

The questions (and answer options) could be repeated up to four times. During the ironic utterance (Table 2, Picture 3), the speaker’s face was not visible to avoid any inferences from their facial expression. To answer the second test question correctly, children had to refer to the speaker’s mental state or attitude to the other agent’s behavior or refer to the negative outcome of the other agent’s behavior or to the opposite/ ironic meaning of the utterance. Answers to this test questions were coded with a fixed coding scheme (adapted from [25, 26], see S1 File).

Results

Coding of predictors

In the syntactic recursion task and the recursive ToM understanding task, children were coded with the highest level of recursion until that they performed consistently correct (e.g., child is coded with “3” when she answers the test questions for level 1–3 correctly, but the test question for level 4 incorrectly). For advanced pragmatics, children received the score of correct trials for the metaphor test questions, the irony test question 1 and irony test question 2 separately (0–2 each). For the coding scheme for irony test question 2 and interrater reliabilities, see S1 File.

Plan of analysis

In a first step, we assured that the children responded consistently in the two trials of the same condition in the change-of-location task, so that we were able to code children’s performance in this task for the subsequent analysis in a binary format (passers vs. non-passers). Second, in scope of the preliminary analyses, we tested for the typical performance in the TB and FB task in computing comparisons against chance level performance for both TB and FB in the three age groups (young, middle, old). Children of all groups were expected to perform above chance level in the FB task whereas only the oldest age group (9;4–10;11 years) was expected to perform better than chance level performance in the TB task. Additionally, we computed correlations between FB and TB performance which were expected to be negative for the two younger age groups and positive for the oldest age group only. To address the main research question of factors that influence the performance in the TB task, we computed a logistic regression model. In the logistic regression model, TB performance (passing vs. no-passing) was predicted by recursive syntactic abilities, recursive ToM understanding, advanced pragmatics understanding and children’s age. We compared this full model with a control model containing only children’s age in months.

TB and FB performance: Consistency across trials in the standard change-of-location task

The consistencies in performance of children over the two trials of the same condition of the standard change-of-location task were high. The percentage of children who had two available trials (meaning all control questions answered correctly) and showed the same performance in both trials was 85.90% (Φ = .68) for the TB trials and 98.75% (Φ = n/c, due to at least one constant variable) for the FB trials. Therefore, both trials were included in the analysis. For the following analysis, the TB and FB performance were coded as binary variables. Children had to pass both trials of a condition to be assigned to the group of passers. Children failing in one or both trials of a condition were assigned to the group of non-passers.

Preliminary analyses: TB and FB performances in different age groups of children

The performance in the change-of-location task as a function of belief type and age is depicted in Fig 3.

Fig 3

Children’s performances in the standard change-of-location task as a function of age group and belief type in Study 1.

To test for the failure in the TB condition and the success in the FB condition of younger children and the success in both conditions in older children, we computed Wilcoxon signed rank tests against chance level performance (0.5) for the three age groups and the two belief conditions. The Wilcoxon tests showed that the youngest age group (6;0–7;7-year-olds) performed significantly above chance in the FB condition (M = .93, p < .0001, r = -.85). The tests could not be computed for the two older age groups due to ceiling effects in the FB condition (7;8–9;3-year-olds and 9;4–10;11-year-olds M = 1). In contrast, Wilcoxon signed rank tests revealed that in the TB condition, only the oldest age group of children (9;4–10;11-year-olds) performed significantly above chance (M = .74, p < .02, r = —.48). Younger children (6;0–7;7-year-olds and 7;8–9;3-year-olds) performed at chance level (6;0–7;7-year-olds: M = .41, p = .34, r = —.18; 7;8–9;3-year-olds: M = .63, p = .18, r = -.26). The correlation between the TB and FB performance in the change-of-location task is (not-significantly) negative for the whole sample (r(phi) = -.13, p = .24) as well as for the youngest age group (6;0–7;7-year-olds: r(phi) = -.34, p = .08). Because of the ceiling effects in the FB condition, the correlation is not computable for the two older age groups.

Main analyses: Predictors for TB performance

We removed children failing the first-order FB condition (N = 2) from the following analyses. This was based on the assumption that children who still do not succeed in the first-order FB task use different cognitive strategies to solve the TB task compared to the group of children we aim to examine here. Descriptive statistics. The mean performances in the recursive ToM understanding task, the advanced pragmatic language task and the syntactic recursion task as a function of TB performance and age are summarized in Table 3.

Table 3

Mean performance (M) and standard deviations (SD) in Syntactic Recursion (Synt. Recurs.), Recursive ToM Understanding (RToM U), metaphor understanding (Metaphor), irony understanding in the first and second test question (Irony1 and Irony2) and for TB non-passers (noTB) and TB passers (TB) in three groups of age.

	n		Synt. Recurs.		RToM U		Metaphor		Irony1		Irony2
	n		M(SD)		M(SD)		M(SD)		M(SD)		M(SD)
Age	noTB	TB	noTB	TB	noTB	TB	noTB	TB	noTB	TB	noTB	TB
6;0–7;7	16	11	2.19 (1.11)	2.00 (1.5)	2.13 (1.36)	2.44 (0.88)	1.33 (0.62)	1.78 (0.67)	1.44 (0.81)	1.00 (1.00)	1.40 (0.83)	0.89 (0.93)
7;8–9;3	10	17	2.80 (0.92)	2.94 (1.25)	2.00 (0.67)	4.18 (1.19)	1.80 (0.42)	1.82 (0.39)	1.00 (0.94)	1.35 (0.79)	1.40 (0.84)	1.24 (0.83)
9;4–10;11	7	20	2.43 (1.62)	3.00 (0.97)	2.57 (1.40)	3.90 (1.37)	1.71 (0.49)	1.80 (0.41)	1.71 (0.49)	1.55 (0.60)	1.86 (0.38)	1.75 (0.44)

Note. Possible range of performances for recursive ToM understanding: 0–5, for pragmatic language task: 0–2, for syntactic recursion task: 0–4.

Note. Possible range of performances for recursive ToM understanding: 0–5, for pragmatic language task: 0–2, for syntactic recursion task: 0–4. For a more detailed summary of answers to irony test question 2, see S1 File. Logistic regression models. We estimated the effect of the different predictors of mental state ascription on the TB performance using a multiple logistic regression model. To control for children’s age in months, we included it into the model, too. Prior to fitting the model, we checked for the assumptions. We checked for multicollinearity (all VIFs ≤ 1.38) and linearity of the logit for age (b = 0.08, p = .81), recursive ToM understanding (b = 1.48, p = .73) and syntactic recursion (b = -2.98, p = .27). We compared the fit of the full model with that of a null model with the control variable only (TB ~ age). As the model comparison is significant, the predictors of mental state ascription have an impact on the TB performance (Model X2(6) = 24.58, p < .001). More specifically, an increased ability in recursive ToM understanding lead to increased TB performance (B = 0.84, p < .001***, OR = 2.33). Pragmatic language abilities, age and syntactic recursion abilities did not affect the TB performance significantly (Table 4).

Table 4

Results of the logistic regression model predicting children’s TB performance with their age in months and their performance in tasks of syntactic recursion (Synt. Recurs.), Recursive ToM understanding (RToM U), and Advanced Pragmatics (Metaphor and Irony1 and Irony2 for irony test questions 1 and 2).

	B(SE)	z	p	95% CI for Odds Ratio
	B(SE)	z	p	Lower	Odds Ratio	Upper
Included
Constant	-4.10 (1.89)	-2.17	.03*
Age in months	0.02 (0.02)	1.29	.20	0.99	1.02	1.06
Synt. Recurs.	-0.19 (0.29)	-0.66	.51	0.46	0.83	1.44
RToM U	0.84 (0.23)	3.67	< .001***	1.54	2.33	3.84
Metaphor	0.71 (0.60)	1.18	.24	0.64	2.03	6.95
Irony1	0.13 (0.39)	0.34	.73	0.52	1.14	2.48
Irony2	-0.68 (0.43)	-1.59	.11	0.21	0.51	1.15

Note. R2 = .44 (Nagelkerke). Model X2(6) = 24.58, p < .001.

***p < .001.

Note. R2 = .44 (Nagelkerke). Model X2(6) = 24.58, p < .001. ***p < .001. Fig 4 pictures this difference in performance in recursive ToM understanding between TB-passers and TB-non-passers.

Fig 4

Children’s performance in recursive ToM understanding as a function of their TB performance (passers vs. non-passers) across age.

Post-hoc analyses. We computed post-hoc one-sided Wilcoxon rank sum tests for TB-passers versus TB-non-passers in recursive ToM understanding as it significantly predicted the outcome in the logistic regression model. Due to multiple testing, Bonferroni correction was applied and resulted in an alpha value of 0.0125 (0.05/4) for this post-hoc computation. The comparison of the performance shows a significant difference in the recursive ToM understanding for TB-non-passers (M = 1.91, n = 33) against TB-passers (M = 3.72, n = 46, W = 319.5, p < .0001, r = -.53) for the whole sample. For the comparison within the age groups, Wilcoxon rank sum tests reveal that the performance differs significantly for the middle age group between TB-non-passers (M = 1.80, n = 10) and TB-passers (M = 4.18, n = 17, W = 16, p < .0001, r = -.73). This difference is not significant for the youngest age group (M(noTB) = 1.75, n = 16; M(TB) = 2.44, n = 9, W = 50.5, p = .09, r = -.34) and the oldest age group of children (M(noTB) = 2.43, n = 7; M(TB) = 3.90, n = 20, W = 34.5, p = .02, r = -.45). The achieved power was computed post-hoc for the logistic regression model. For the significant predictor recursive ToM understanding, this resulted in a power of 1 –ß = .89.

Discussion

The expected pattern of typical TB and FB performance in children between six and ten years was replicated in the online study: children in the youngest and middle age group (6;0–9;3 years) failed to perform above chance level in the TB task while FB performance was at ceiling. Only children in the oldest age group (9;4–10;11years) performed proficiently in both conditions. Additionally, the study shows first evidence that children’s TB performance can be (partly) explained by their understanding for recursive ToM. However, none of the variables of Advanced Pragmatics understanding or syntactic recursion were significant predictors for children’s TB performance.

Study 2

Study 2, therefore, aimed to replicate this relation between TB performance and children’s recursive ToM. In order to explore the underlying recursive ToM abilities in more detail, Study 2 operationalized children’s recursive ToM twofold: similar to Study 1, children’s understanding for recursive mental state ascriptions was measured. Additionally, children’s recursive ToM production was measured to identify the cognitive mechanisms relevant for the TB task more fine-grained. The final sample included eighty-seven 6- to 10-year-old children (72–131 months, mean age = 101.41 months; 44 girls, 43 boys). Seven additional children were tested but excluded from data analyses because of technical issues during the test session (N = 3), experiential error (N = 1), uncooperative behavior (N = 1), parental interference (N = 1) or children’s age (child turned out to be too old on the day of the test session; N = 1). Test for syntactic recursion. The task for syntactic recursion was administered as in Study 1. Standard change-of-location task. For the standard change-of-location task, we used the same material as in Study 1. As the consistency of the two trials (two FB and two TB trials) in Study 1 was high, only one trial per condition was administered in Study 2. Again, the TB trial was always presented first. Recursive ToM production. Children saw two story lines containing bluffs [partly adapted from 40]. In the first story, a character bluffed to mislead someone. In the second story line, the character first uttered a double bluff, i.e. he told the truth as the opponent expected him to lie. As the second story line continued, the same character again uttered a bluff based on his double bluff: he lied as the opponent now expected him to say the truth (triple bluff). The story lines were presented as short animated video clips. After each utterance, children were asked to explain the bluffs (“Why does he say that?”). To receive detailed answers, E asked the child to explain her answer in more detail. The exact wording of this follow-up question depended on the length of the child’s initial answer. Answers to the open question were coded binary (child did explain the bluff correctly (1) or did not understand the bluff (0)). Children received the highest score of bluffs that were all explained correctly (0–3). Example of task for recursive ToM production Double bluff [shortened, adapted from 40] The treasure diggers want to find the pirates’ treasure. They know that the treasure is either in the field or in the mountains. They hold one of the pirates captive and ask him where the treasure is. The captured pirate is very brave and very smart, he will not let the treasure diggers find the treasure. The treasure is in the mountains. When the treasure diggers ask the pirate where the treasure is, the pirate answers, "in the mountains". Control question: Is that right what he said? (correct answer: yes) Test Question: Why did he say that? For follow-up test question depending on length of the answer to initial test question, see S1 File. Recursive Tom: Understanding. The task testing for the understanding of recursive ToM was taken over from Study 1 and extended: Children received one additional story and four additional test questions resulting in four story lines, two test questions per story and order of recursion (for more details, see S1 File). If a child answered both second order questions or more than 50% of the first four test questions incorrect, the test session was terminated after the third story line. In the syntactic recursion task and the recursive ToM understanding and production task, children were coded with the highest level of recursion up to which they answered the test question/ both test question correctly (e.g., child is coded with “3” for recursive ToM understanding when she answers all test questions for level 1–3 correctly, one test question for level 4 correctly and one test question for level 4 incorrectly). For recursive ToM production, children receive the score of the respective bluff they explained correctly (0–3; “1” when they only explained the bluff correctly, “2” when they explained the bluff and the double-bluff correctly and “3” when they explained all three bluffs correctly). For interrater reliabilities, see S1 File. The analysis was conducted in close similarity to Study 1. In scope of the preliminary analyses, we tested for the typical performance in the TB and FB task in computing comparisons against chance level performance for both TB and FB in the same three age groups (young, middle, old) and computed correlations between FB and TB performance. To address the main research question of factors that influence the performance in the TB task, we again computed a logistic regression model. In the logistic regression model, TB performance (passing vs. no-passing) was predicted by children’s age, recursive syntactic abilities, recursive ToM production and recursive ToM understanding. We compared this full model with a control model containing only children’s age. The performance in the change-of-location task as a function of belief type and age is depicted in Fig 5.

Fig 5

Children’s performance in the change-of-location task as a function of belief type and age in Study 2.

Note. Sample size of the groups vary as only children were included who answered the respective control questions correctly.

Children’s performance in the change-of-location task as a function of belief type and age in Study 2.

Note. Sample size of the groups vary as only children were included who answered the respective control questions correctly. To test for the failure in the TB condition and the success in the FB condition of younger children and the success in both conditions in older children, we computed Wilcoxon signed rank tests against chance level performance (0.5) for the three age groups and the two belief conditions. Wilcoxon showed that the youngest age group (6;0–7;7-year-olds) performed significantly above chance in the FB condition (M = .86, p < .001, r = -.70). Wilcoxon tests could not be computed for the two older age groups due to ceiling effects in the FB condition (7;8–9;3-years-olds and 9;4–10;11-year-olds: M = 1). In contrast, Wilcoxon signed rank tests revealed that in the TB condition, only the oldest age group of children (9;4–10;11-year-olds) performed significantly above chance (M = .76, p < .01, r = -.51). Younger children (6;0–7;7-year-olds and 7;8–9;3-year-olds) performed at chance level (6;0–7;7-year-olds: M = .43, p = .46, r = -.14, 7;8–9;3-year-olds: M = .59, p = .36, r = -0.17). The correlation between the TB and FB performance in the change-of-location task is (not-significantly) negative for the whole sample (r(phi) = -.19, p = .10) and significantly negative for the youngest age group (6;0–7;7-year-olds: r(phi) = -.47, p = .02). Because of the ceiling effects in the FB condition, the correlation is not computable for the two older age groups. Descriptive statistics. The mean performances in the recursive ToM understanding task, the recursive ToM production task and the syntactic recursion task as a function of TB performance (passers vs. non-passers) and age (young, middle, old age group) are summarized in Table 5.

Table 5

Mean performance (M) and standard deviations (SD) in Recursive ToM Understanding (RToM U), Recursive ToM Production (RToM P) and understanding of syntactic recursion (Syntact Recurs.) for TB non-passers and TB passers in three groups of age.

	n		Synt. Recurs.		RToM P		RToM U
	n		M(SD)		M(SD)		M(SD)
Age	noTB	TB	noTB	TB	noTB	TB	noTB	TB
6;0–7;7	16	12	2.19 (1.11)	2.08 (1.16)	1.06 (0.25)	1.17 (0.39)	1.25 (1.06)	1.17 (1.03)
7;8–9;3	12	17	3.00 (0.95)	3.00 (1.12)	1.67 (0.89)	2.18 (1.07)	2.92 (0.90)	2.65 (1.46)
9;4–10;11	7	22	2.57 (1.27)	3.00 (1.02)	1.86 (0.90)	1.95 (0.90)	2.71 (1.60)	3.36 (1.43)
all	35	51	2.54 (1.12)	2.78 (1.14)	1.43 (0.74)	1.84 (0.95)	2.11 (1.37)	2.61 (1.59)

Note. Possible range of performance for Recursive ToM Understanding task: 0–5, for Recursive ToM Production: 0–3, for Syntactic Recursion task: 0–4.

Note. Possible range of performance for Recursive ToM Understanding task: 0–5, for Recursive ToM Production: 0–3, for Syntactic Recursion task: 0–4. Logistic regression models. We again removed children failing the first-order FB condition (or having no FB trial available due to incorrect answers to the control questions in the FB condition, n = 10) from the following analyses. Prior to fitting the model, we checked for the assumptions. We checked for multicollinearity (all VIFs ≤ 1.49) and linearity of the logit for Age (b = 0.11, p = .61), recursive ToM understanding (b = 1.32, p = .43), recursive ToM production (b = 1.32, p = .43) and syntactic recursion (b = 1.93, p = .15). None of the predictors in the full model predicted significantly children’s TB performance (Table 6). The comparison of the full model with the null model containing the control variable only (TB ~ age) was not significant (X2(3) = 2.95, p = .40).

Table 6

				95% CI for Odds Ratio
	B(SE)	z	p	Lower	Odds Ratio	Upper
Included
Constant	-3.09 (1.52)	-1.88	.04
Age in months	0.02 (0.02)	1.04	.15	0.99	1.02	1.06
Synt. Recurs.	0.12 (0.24)	0.41	.62	0.70	1.12	1.80
RToM P	0.26 (0.36)	0.73	.47	0.66	1.30	2.75
RToM U	0.11 (0.20)	0.54	.59	0.75	1.11	1.67

Note. R2 = .16 (Nagelkerke). Model X2(3) = 2.95, p = .40

Note. R2 = .16 (Nagelkerke). Model X2(3) = 2.95, p = .40 Post-hoc comparisons. We did not compute post-hoc one-sided two-sample Wilcoxon tests for TB-passers versus TB-non-passers in recursive ToM as it did not significantly predict the outcome in the logistic regression model in Study 2. Study 2 did not show the relationship between children’s TB performance and their recursive ToM found in Study 1, neither operationalized in terms of their understanding for recursive ToM nor in their own production of recursive ToM. The same task for children’s recursive ToM understanding was used as in Study 1, however, the task was extended by an additional storyline and additional test questions and the task for recursive ToM production was added. This increased the duration of testing significantly and may have had a negative outcome on children’s concentration and motivation in the test session and, therefore, the validity of results.

Study 3

We therefore conducted Study 3 in which the initial relationship between children’s TB performance and their recursive ToM understanding was tested without additional tasks testing for Advanced Pragmatics or recursive ToM production to reduce the duration of the test session and hold children’s concentration as constant as possible. The final sample included sixty-four 6- to 10-year-old children (72–131 months, mean age = 102 months; 32 girls, 32 boys). Five additional children were tested but excluded from data analyses because of technical issues during the test session (N = 4) and uncooperative behavior (N = 1). The task for syntactic recursion, the standard change-of-location task and the task for recursive ToM understanding were administered with the same material and procedure as in Study 2. In contrast to Study 2, there was no termination rule for the task testing for recursive ToM understanding. As in Study 2, children were coded with the highest level of recursion up to which they answered all test question correctly (that is, one test question in the syntactic recursion task and two test questions in the recursive ToM understanding task). The analysis was conducted in close similarity to Study 1 and 2. In scope of the preliminary analyses, we again tested for the typical performance in the TB and FB task in computing comparisons against chance level performance for both TB and FB in the three age groups (young, middle, old) and computed correlations between FB and TB performance. To address the main research question of factors that influence the performance in the TB task, we again computed a logistic regression model. In the logistic regression model, TB performance (passing vs. no-passing) was predicted by children’s age, recursive syntactic abilities and recursive ToM understanding. We compared this full model with a control model containing only children’s age. The performance in the change-of-location task as a function of belief type and age is depicted in Fig 6. Children were included when they answered the respective control questions correctly.

Fig 6

Children’s performance in the standard change-of-location task as a function of belief type and age in Study 3.

Note. Sample size of the groups vary as only children were included who answered the respective control questions correctly.

Children’s performance in the standard change-of-location task as a function of belief type and age in Study 3.

Note. Sample size of the groups vary as only children were included who answered the respective control questions correctly. To test for the failure in the TB condition and the success in the FB condition of younger children and the success in both conditions in older children, we computed two-sided Wilcoxon tests against chance level performance (0.5) for the three age groups and the two belief conditions. The Wilcoxon tests showed that the youngest age group (6;0–7;7-year-olds) performed significantly above chance in the FB condition (M = .95, p < .0001, r = -.87). Wilcoxon tests could not be computed for the two older age groups due to ceiling effects in the FB condition (7;8–9;3-year-olds and 9;4–10;11-year-olds M = 1). In contrast, Wilcoxon tests revealed that in the TB condition, only the oldest age group of children (9;4–10;11-year-olds) performed significantly above chance (M = .82, p < .01, r = -.65). Younger children performed at chance level (6;0–7;7-year-olds: M = .57, p = .53, r = -.14, 7;8–9;2-year-olds: M = .48, p = .84, r = -.04). The correlation between the TB and FB performance in the change-of-location task is (non-significantly) negative for the whole sample (r(phi) = -.10, p = .46) and (non-significantly) negative for the youngest age group (6;0–7;7-year-olds: r(phi) = -.19, p = .43). Because of the ceiling effects in the false belief condition, the correlation is not computable for the two older age groups. Descriptive statistics. The mean performances in the syntactic recursion task and the recursive ToM understanding task as a function of TB performance (passers vs. non-passers) and age (young, middle, old age group) are summarized in Table 7.

Table 7

Mean performance (M) and standard deviations (SD) in understanding of syntactic recursion (Synt Recurs.) and Recursive ToM Understanding (RToM_U) for TB non-passers and TB passers in three groups of age.

	n		Synt. Recurs.		RToM U
	n		M(SD)		M(SD)
Age	noTB	TB	noTB	TB	noTB	TB
6;0–7;7	9	12	2.11 (1.17)	2.83 (0.83)	2.00 (0.50)	1.25 (0.62)
7;8–9;3	11	10	2.55 (1.21)	3.00 (0.94)	1.73 (1.01)	2.50 (1.08)
9;4–10;11	4	18	3.00 (1.15)	3.11 (1.28)	2.75 (1.71)	3.50 (1.42)
all	24	40	2.46 (1.18)	3.00 (1.06)	2.00 (1.02)	2.58 (1.48)

Note. Possible range of performances for syntactic recursion task: 0–4, for Recursive ToM Understanding task: 0–5.

Note. Possible range of performances for syntactic recursion task: 0–4, for Recursive ToM Understanding task: 0–5. Logistic regression models. We again removed children failing to answer first-order FB test and/or control questions correctly (n = 4) from the following analyses. Prior to fitting the model, we checked for the assumptions. We checked for multicollinearity (all VIFs ≤ 1.55) and linearity of the logit for age (b = 0.19, p = .40), syntactic recursion (b = -2.42, p = .11) and recursive ToM understanding (b = 1.61, p = .18). None of the predictors in the full model predicted significantly children’s TB performance (Table 8). The comparison of the fit of the full model with the null model with the control variable only (TB ~ age) was not significant (X2(1) = 0.07, p = .79).

Table 8

Results of the logistic regression model predicting children’s TB performance with their age in months and their performance in tasks of syntactic recursion (Synt. Recurs.) and Recursive ToM understanding (RToM U).

				95% CI for Odds Ratio
	B(SE)	z	p	Lower	Odds Ratio	Upper
Included
Constant	-2.80 (1.80)	-1.56	.12
Age in months	0.02 (0.02)	1.23	.26	0.98	1.02	1.06
Synt R	0.34 (0.26)	1.31	.19	0.85	1.40	2.37
RToM U	0.08 (0.29)	0.26	.79	0.61	1.08	1.93

Note. R2 = .12 (Nagelkerke). Model X2(1) = 0.07, p = .13.

Note. R2 = .12 (Nagelkerke). Model X2(1) = 0.07, p = .13. Similar to Study 2, the present study shows the typical TB performance in children between six and ten years but fails to replicate the relationship between children TB performance and their recursive ToM understanding that was found in Study 1. As neither children’s syntactic recursion abilities nor their recursive ToM understanding predicted their TB performance, the results of the study therefore do not match with the predictions made by the pragmatic performance analysis.

Supplementary explorative analysis: Correlations of predictors in Study 1–3

In addition to the planed analysis, we conducted an exploratory analysis to investigate how the various predictors relate to each other. To this end, we computed partial correlations for the predictors in each study controlled for children’s age in months (Table 9).

Table 9

Partial correlation of predictors in Study 1–3 controlled for children’s age in months.

	Study 1					Study 2		Study 3
	1	2	3	4	5	1	2	1
1. Synt
2. RToM P						.27**
3. RToM U	.30**					.07	.11	.32**
4. Meta	.15		.15
5. Irony1	-.16		-.04	-.14
6. Irony2	.20		.00	.01	.37***

** indicates p < .01

*** indicates p < .001.

** indicates p < .01 *** indicates p < .001. Table 9 shows a medium correlation between children’s performance in the syntactic recursion task and recursive ToM understanding task for Study 1 and 3, but not in study 2. In study 2, however, children’s performance in the syntactic recursion and recursive ToM production task correlated significantly. Additionally, children’s performance in the first and second irony test question of the advanced pragmatics task in Study 1 show a medium to high correlation (Table 9).

General discussion

Background to the present study was a puzzling empirical phenomenon: studies that administered the TB version of the classical change-of-location task to a broad age range of children yielded a surprising U-shaped performance curve: young children master this task, then from around age four children come to fail and performance only recovers around age eight to ten. Based on evidence that suggests that the decline in performance around age four reflects pragmatic confusions, the current set of studies tested whether performance recovery at the end of the U-shaped performance curve can be explained by another developmental change in children’s pragmatic understanding. The studies therefore aimed to replicate the typical TB pattern (in an online testing format) and addressed potential factors that might explain the end of the U-shaped curve in two ways: Is the TB pattern a function of advanced pragmatic development? Even more generally: Is it a function of recursive ToM or recursive thinking in general? The results, first of all, replicate the typical TB pattern in children between six and ten years: only the oldest age group (9;4–10;11 years) answered the TB task correctly while younger children performed at chance level. Investigating potential factors for the TB performance, the studies extended existing research methods by measuring children’s recursive ToM and syntactic recursive abilities. Results revealed that children’s syntactic recursion abilities and their recursive ToM understanding (Study 1 and 3) and recursive ToM production (Study 2) correlated substantially–indicating that they tap a common underlying capacity for recursive thinking. However, the results regarding the main research question remain inconclusive: Study 1 suggests that children’s recursive ToM, but not their advanced pragmatics understanding or general recursive thinking abilities, predict their TB performance. This relationship, though, could not be replicated in Studies 2 and 3 in which neither recursive ToM nor recursive thinking in general explained children’s performance in the TB task. The studies overall, therefore, do not provide clear evidence for the pragmatic analyses of children’s performance recovery in the TB task around age eight to ten. Of course, absence of evidence for a solid relationship between advanced pragmatics understanding, recursive ToM and recursion in general and TB performance does not amount to evidence of absence of any such relationship. There might be an actual relationship between children’s advanced pragmatics and related factors and their TB performance that could not be reliably shown in the current set of studies because of different reasons. There are different possibilities regarding how this may be the case about which we can here only speculate. One possibility is that there is an actual relationship, at least, between recursive ToM understanding and TB performance as shown in Study 1, that is indeed less pronounced than Study 1 indicated. The sample size calculations for the subsequent studies were based on the potentially overestimated medium effect size from Study 1 which possibly caused Studies 2 and 3 to be too underpowered to detect an effect that may be more subtle. Future studies need to address this potential issue with adequate sample sizes that would also detect smaller relationships. Another possibility is that we failed to detect an actual relationship because of the implementation of the various predictors, especially children’s advanced pragmatic understanding. This was operationalized as children’s understanding for indirect speech acts, more specifically, their understanding of metaphors and irony–based on approaches that suppose non-literal language comprehension involves the ascription of complex intentions. The pragmatic analysis of the TB tasks predicts that, by ascribing such complex communicative intentions, children at the end of the U-shape curve resolve their pragmatic confusions. However, much daily non-literal language including frozen metaphors (whose metaphorical content is “dead”) and frequently used ironic remarks is conventionalized in language [24, 41]. In cases of such conventionalized metaphor use as “The ATM swallowed my credit card” the hearer usually understands what is said without processing the literal meaning first and making inferences about the speaker meaning in a second step [42 p. 116]. Similarly, ironic utterances that are used with high frequency and familiarity (e.g., “That’s just great”) might become, to some degree, conventionalized. As a consequence, their non-literal meaning might become directly accessible [41], (see also [43] for conventionalized versus non-conventionalized indirect requests). In contrast, understanding un-conventionalized non-literal language requires the hearer to make pragmatic inferences based on her world knowledge, the context and the lexical meaning of the utterance [24 p. 247] and therefore tap a different set of cognitive skills than conventionalized non-literal language does [44]. Another possibility is that we failed to detect a relation to children’s non-literal language understanding since the relevant pragmatic abilities involved in the comprehension of the TB task, on the one hand, and of irony and metaphor comprehension, on the other hand, might be quite different. (We thank the anonymous Reviewer of PloS ONE for sharing this issue in their review of an earlier version of this manuscript). Ironic and metaphoric speech acts involve a mismatch of sentence and speaker meaning. The pragmatic challenge is that the hearer needs to overcome the literal sentence meaning to be able to grasp the speaker meaning. In the TB task, however, sentence and speaker meaning are aligned. The challenge of the TB task might rather be to understand the speaker’s conversational goal as the task and the answer to the test question are obvious and common ground and do not allow any other perspective on the scenario. Both the understanding for non-literal language and the TB task might be applications of advanced pragmatics understanding. However, they might not necessarily be seen as a unified phenomenon nor be subject to a same development [45]. Future studies, therefore, need to carefully operationalize these predictors of advanced pragmatics in order to measure children’s abilities to ascribe recursive communicative intentions in valid ways. To this end, future work will need to test whether an operationalization with non-conventionalized non-literal language (such as novel metaphors and unfamiliar ironic utterances) or other pragmatics measures (e.g., understanding academic test question in general, understanding of literal meaning that changes with variations in context etc.) may succeed in measuring children’s abilities for pragmatic inferences and recursive intention ascription and finding a relationship to children’s TB performance. Further future studies might test children’s abilities in ascribing recursive intentions beyond the scope of (non-literal) verbal language understanding. An alternative that completely avoids any problems of interference of individual language knowledge would be to measure children’s abilities in ascribing recursive communicative intentions in non-verbal scenarios. That might test the capacity for recursive, higher-order mindreading (that may also be applied to pragmatic language use) more generally. Cooperative coordination scenarios such as, for example, Stag Hunt scenarios, provide an elegant opportunity to ask children to reason recursively about other’s non-verbal communicative acts. Stag Hunt scenarios [based on a parable by Jean-Jacque Rousseau, see 46] are game-theoretic interactions that require participants to cooperate with their fellow players to achieve a joint goal. Typically, they represent situations in which a player can choose between two options: the player can either hunt a hare (i.e. win a low value price) on her own or hunt a stag (i.e. win a high value price). If she decides to hunt a hare, the player will succeed independently of the other player’s decision; if she decides to hunt a stag, in contrast, she only succeeds if the other player does so as well [46]. In a child-friendly adaptation, Wyman and colleagues showed that preschoolers succeed in coordinating with an adult co-player based on minimal non-verbal communication [47]. As the high-value option was only occasionally available, children needed to base their decision not only on what they themselves saw (“There is a stag”) but also on what they thought about their fellow player (“She saw that there is a stag”) and on what they inferred what their fellow player potentially thought about the them (“She knows that I know that she knows that there is a stag”). Children inferred the mutual knowledge of what their fellow player saw, knew and intended based on minimal non-verbal communicational cues (eye contact and smiling) and successfully based their decision for one of the two options on these inferences [47]. In a structurally related task by Grueneisen and colleagues, children were tested in a peer coordination scenario [48]. Children had to anticipate their partner’s game decision based on a second-order mental state ascription (“She does not know that I know that p. She thinks that I falsely believe that q.”) and had to coordinate their own game decision accordingly. Six-year-olds demonstrated their capacity to use such second-order mental state attributions to successfully coordinate with their peer without any communication [48]. Future studies could be based on such methods and thereby ask children to coordinate non-verbally in game scenarios with increasing complexity. Such adaptations might involve multiple co-players and, therefore, require even more complex recursive inferences (e.g., “Player 1 thinks that Player 2 did not see that Player 3 saw that there is a stag”). A systematic analysis of children’s decisions in such game scenarios might then indicate the level of recursion at which they are able to reason. A further line of future research will need to address the more general question of a unitary, domain-general development of recursive capacities. The current studies tested for children’s recursive syntactic abilities until fourth order of recursion and children’s recursive ToM understanding until fifth order of recursion as well as children’s recursive ToM production in bluff scenarios. The results of correlated performance in recursion of mental representations and syntactic recursion provide first evidence for a shared underlying ability of general recursive thinking. In future research, these preliminary results need to be validated and extended over various forms of recursive thinking. Recently, it was theorized that embedding of recursive temporal representations shares conceptual similarities with embedding of recursive mental state representations [36]. Future studies thus need to compare children’s development in holding recursive temporal representations (i.e. different forms of mental time travel) and higher-order ToM. Potential parallel trajectories of increasing levels of reasoning independently of its domain (i.e. mental states or temporal representations) would indicate shared underlying capacities of general recursive reasoning.

Conclusion

In conclusion, the current set of studies replicate the typical TB performance pattern in children between six and ten years in an online testing format but do not provide clear evidence for an underlying development in advanced pragmatics that can explain this pattern. Future studies need to investigate more thoroughly whether this absence of evidence marks evidence of absence of any such relation between pragmatics and TB performance; or whether such relations exist but can be tapped only with suitably modified measures. (PDF) Click here for additional data file. (XLSX) Click here for additional data file. (XLSX) Click here for additional data file. (XLSX) Click here for additional data file. 16 Dec 2021

PONE-D-21-34148

How do children overcome their pragmatic performance problems in the True Belief Task? The Role of Advanced Pragmatics and Higher-order Theory of Mind

PLOS ONE Dear Dr. Schidelko, Thank you for submitting your manuscript to PLOS ONE. I have sent it to two expert reviewers and have now received their comments back. As you can see at the bottom of this email and attached, the reviewers are quite positive about the manuscript. Both reviewers think that the paper addresses an interesting topic and that the studies are sound and informative. I do agree with their assessment. However, you will see that the reviewers also have some suggestions for improvement, notably in the introduction and discussion. I encourage you to take into account the reviewers' comments in a revised version of the manuscript. Please submit your revised manuscript by Jan 30 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Jérôme Prado Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide. 3. Please include your full ethics statement in the ‘Methods’ section of your manuscript file. In your statement, please include the full name of the IRB or ethics committee who approved or waived your study, as well as whether or not you obtained informed written or verbal consent. If consent was waived for your study, please include this information in your statement as well. 4. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. 5. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer #1: Summary In this paper, Schidelko, Proft, and Rakozcy investigate possible mechanisms underlying the U-shaped curve in true belief task performance between the ages of 5 and 10. Earlier research suggests that initial failures in TB tasks is due to pragmatic difficulties when interpreting the test question. However, little is known about the developmental processes that lead older children to eventually overcome these difficulties and pass the task. In keeping with the pragmatic development explanation of children’s earlier failures, the authors aimed to test the hypothesis that children’s later successes on TB tasks are due to the development of more advanced pragmatic abilities. To test this hypothesis, the authors measured the relationship between TB performance and several different measures of general pragmatic competence. The authors ultimately find no reliable relationship between these measures of pragmatic ability and TB tasks. Possible explanations for these null results and future directions are discussed. Overall evaluation This was a well-designed study and a good attempt at explaining a very puzzling phenomenon within the theory of mind development literature. As someone who follows this literature closely, I found the manuscript informative, and I expect that sharing these null results will still ultimately contribute to our understanding of the phenomenon in question. I do think that the introduction and discussion could use some work. While the empirical results of the study are inconclusive, there is an opportunity for a rich theoretical discussion, which I would encourage the authors to expand upon. Comments I found the section in the introduction on pragmatic development a little thin. At the very least this section should provide some kind of survey of how metaphor and irony-comprehension develop in neurotypical populations, since these abilities are being used as a proxy for advanced pragmatic knowledge (e.g. Lecce, S., Ronchi, L., Del Sette, P., Bischetti, L., & Bambini, V. (2019). Interpreting physical and mental metaphors: Is Theory of Mind associated with pragmatics in middle childhood?. Journal of Child Language, 46(2), 393-407). This will better motivate the choice of measures and also help the readers contextualize the results in the broader literature. Relying on a lone citation to Happe’s 1993 paper on metaphor comprehension in ASD is also problematic, as there is more recent evidence that certain forms of pragmatic processing are actually intact in ASD (see Geurts, B., Kissine, M., & van Tiel, B. (2019). Pragmatic reasoning in autism. In Thinking, reasoning, and decision making in Autism (pp. 113-134). Routledge.) In the initial discussion of recursive theory of mind, it would also be helpful to explain how or why higher-order recursive mindreading would be related to TB task performance specifically, rather than just positing it as another general measure of pragmatic competence. Otherwise the choice of task feels slightly undermotivated. This might be accomplished by explaining how recursive mindreading is implicated in TB test-question responses. In the discussion, the authors discuss some of the limitations of the metaphor-comprehension task as a measure of pragmatic understanding. One such limitation is the possibility that metaphorical expressions become highly conventionalized, so that their literal meaning is no longer semantically accessible. This is plausible, as far as it goes. However, this explanation does not extend to the irony comprehension measure of advanced pragmatic ability, which also failed to correlate with performance on the TB task. The meaning of ironic speech acts seems much more context-dependent than the meaning of metaphors, and much less amenable to conventionalization. So we are still left without much of an explanation of this particular result. My two cents on the matter is that the relevant pragmatic abilities involved in understanding academic questions on the one hand and irony and metaphor on the other seem quite different. What makes irony (and perhaps some metaphors) hard is that the primary intention of these speech acts – i.e. the speaker meaning – doesn’t correspond with their secondary intention or sentence meaning. But the speaker meaning and sentence meaning in the TB test question are aligned. The issue is instead related to the fact that the answer to the question is already in the common ground, and so it’s not obvious to the child what the speaker’s conversational goal is. While there is a broad sense in which both of these obstacles are pragmatic in nature, it’s not obvious that what you need to know in order to overcome these obstacles is the same in each case, or that they would stem from similar kinds of experiences. Tentatively, one might even conclude that using metaphor and irony comprehension as a general measure of advanced pragmatic abilities might be misguided, or that pragmatic ability is not a unified phenomenon (see again the Geurts et al. chapter cited above). The other suggestion in the discussion that future studies might focus on purely nonverbal forms of communication also struck me as odd given that children’s initial failures really do seem to be so closely tied to their understanding of the experimenter’s speech act. There are no trivial responses or academic questions in a nonverbal stag hunt game, and it’s not terribly obvious what the one thing has to do with the other. This is not to say that looking into the relationship between these tasks and TB tasks would be fruitless (perhaps it is a better measure of recursive mindreading), but it seems a bit odd for this to occupy so much space in the discussion. On a related note, I found myself wondering whether this failure to detect a reliable correlation with more distally related pragmatic abilities might suggest a different future direction: finding a way to operationalize the specific type of pragmatic competency that underlies TB task performance, according to the pragmatic development account (e.g. understanding of trivial or academic questions). This would be in line with the broader theory Minor correction: On p. 9, a contrast is drawn between children’s recursive mindreading abilities as measured in Liddle and Nettle and adults’ recursive mindreading abilities as measured by O’Grady et al. It is implied that between 10-11 years of age and adulthood, we go from chance on fourth-order recursive ToM tasks to success on 7th-order recursive ToM tasks. But O’Grady and colleagues use a very different, “implicit” measure of recursive mindreading, so these two results are not directly comparable. This should be corrected. Reviewer #2: The comments to the Author are included in the attached file labelled "Review". Please refer to this file. This provides my suggestions to revise the manuscript before publication. My overall assessment is " Minor revisions". ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Diana Mazzarella [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Submitted filename: PLOS ONE_review.pdf Click here for additional data file. 28 Jan 2022 Reviewer #1 Summary In this paper, Schidelko, Proft, and Rakozcy investigate possible mechanisms underlying the U-shaped curve in true belief task performance between the ages of 5 and 10. Earlier research suggests that initial failures in TB tasks is due to pragmatic difficulties when interpreting the test question. However, little is known about the developmental processes that lead older children to eventually overcome these difficulties and pass the task. In keeping with the pragmatic development explanation of children’s earlier failures, the authors aimed to test the hypothesis that children’s later successes on TB tasks are due to the development of more advanced pragmatic abilities. To test this hypothesis, the authors measured the relationship between TB performance and several different measures of general pragmatic competence. The authors ultimately find no reliable relationship between these measures of pragmatic ability and TB tasks. Possible explanations for these null results and future directions are discussed. Overall evaluation This was a well-designed study and a good attempt at explaining a very puzzling phenomenon within the theory of mind development literature. As someone who follows this literature closely, I found the manuscript informative, and I expect that sharing these null results will still ultimately contribute to our understanding of the phenomenon in question. I do think that the introduction and discussion could use some work. While the empirical results of the study are inconclusive, there is an opportunity for a rich theoretical discussion, which I would encourage the authors to expand upon. Comments 1. I found the section in the introduction on pragmatic development a little thin. At the very least this section should provide some kind of survey of how metaphor and irony-comprehension develop in neurotypical populations, since these abilities are being used as a proxy for advanced pragmatic knowledge (e.g. Lecce, S., Ronchi, L., Del Sette, P., Bischetti, L., & Bambini, V. (2019). Interpreting physical and mental metaphors: Is Theory of Mind associated with pragmatics in middle childhood?. Journal of Child Language, 46(2), 393-407). This will better motivate the choice of measures and also help the readers contextualize the results in the broader literature. Relying on a lone citation to Happe’s 1993 paper on metaphor comprehension in ASD is also problematic, as there is more recent evidence that certain forms of pragmatic processing are actually intact in ASD (see Geurts, B., Kissine, M., & van Tiel, B. (2019). Pragmatic reasoning in autism. In Thinking, reasoning, and decision making in Autism (pp. 113-134). Routledge.) Answer to Comment 1: Thank you for this comment which is in line with the comments provided by Reviewer 2. In response, we added a more explicit description of the pragmatic analysis of children’s performance in the TB task and its parallels to the understanding of non-literal language; and we inserted a new short overview of the available experimental evidence in developmental non-literal language comprehension in the more detailed part “Advanced Pragmatic Understanding” in the Introduction. In the General Discussion, we partly revise and extend this analysis presented in the introduction as it became more fine-grained over the course of the studies. Thank you very much for providing us with several references to improve our manuscript. 2. In the initial discussion of recursive theory of mind, it would also be helpful to explain how or why higher-order recursive mindreading would be related to TB task performance specifically, rather than just positing it as another general measure of pragmatic competence. Otherwise the choice of task feels slightly undermotivated. This might be accomplished by explaining how recursive mindreading is implicated in TB test-question responses. Answer to Comment 2: We now elaborate that children’s recursive mindreading abilities might be related to their TB performance in two ways (see page 11). On the one hand, recursive mindreading enables children to understand initially pragmatic speech acts (here the academic test questions) as they are able to ascribe complex communicative intentions. On the other hand, children might be still confused by the TB test question but use their recursive mindreading abilities to resolve this confusion by ascribing higher-order mental states (e.g., to the experimenter). We now discuss both possibilities in the manuscript and hope that the motivation for the thorough choice of tasks becomes more evident now. Thank you very much for helping us to improve our line of argumentation. 3. In the discussion, the authors discuss some of the limitations of the metaphor-comprehension task as a measure of pragmatic understanding. One such limitation is the possibility that metaphorical expressions become highly conventionalized, so that their literal meaning is no longer semantically accessible. This is plausible, as far as it goes. However, this explanation does not extend to the irony comprehension measure of advanced pragmatic ability, which also failed to correlate with performance on the TB task. The meaning of ironic speech acts seems much more context-dependent than the meaning of metaphors, and much less amenable to conventionalization. So we are still left without much of an explanation of this particular result. Answer to Comment 3: Thank you very much for pointing out that the explanation was not clear enough regarding children’s irony comprehension which also failed to correlate with the TB performance. We now elaborate that ironic utterances might also become conventional when used with high frequency and familiarity (e.g., “That’s just great!” or in German language “Du bist ja clever” meaning “You are [German modal particle] clever”, which is similarly used as ironic utterance in one of the two story lines in Exp. 1). Whether and which ironic utterances are conventionalized, might differ between children. So, this is only one speculative explanation for the missing relationship of TB performance and metaphor and irony comprehension. Future work will therefore need to continue to investigate, first, whether the development of comprehension of familiar and unfamiliar ironic utterances does differ (see Burnett, 2015 for differences in comprehension of speaker meaning but not of speaker attitude). Second, and in direct relation to our project, future studies will need to test whether an operationalization with rather unfamiliar ironic utterances and novel metaphors may succeed in measuring children’s abilities for pragmatic inferences and recursive intention ascription and finding a relationship to children’s TB performance. This is stated with reference to both irony and metaphor understanding in the new version of our manuscript (see page 45-46). 4. My two cents on the matter is that the relevant pragmatic abilities involved in understanding academic questions on the one hand and irony and metaphor on the other seem quite different. What makes irony (and perhaps some metaphors) hard is that the primary intention of these speech acts – i.e. the speaker meaning – doesn’t correspond with their secondary intention or sentence meaning. But the speaker meaning and sentence meaning in the TB test question are aligned. The issue is instead related to the fact that the answer to the question is already in the common ground, and so it’s not obvious to the child what the speaker’s conversational goal is. While there is a broad sense in which both of these obstacles are pragmatic in nature, it’s not obvious that what you need to know in order to overcome these obstacles is the same in each case, or that they would stem from similar kinds of experiences. Tentatively, one might even conclude that using metaphor and irony comprehension as a general measure of advanced pragmatic abilities might be misguided, or that pragmatic ability is not a unified phenomenon (see again the Geurts et al. chapter cited above). Answer to Comment 4: We follow your thoughts on the distinction of pragmatic abilities that are involved in irony (and metaphor) understanding versus pragmatic abilities that are involved in the understanding of the speech act in the TB test. We recently had similar concerns; however, we did not concentrate too much on this difference before we conducted the study as the field of pragmatics comprises a wide spectrum and allows for various foci of analyses. Another level of analysis is, for example, to compare illocutionary and perlocutionary aspects of the TB test question and metaphoric/ironic utterances. However, as we cannot include all of these analyses in the discussion of our manuscript, we now decided to add the issue you raised as a further discussion point in the manuscript and emphasize the need of future work on this (see page 46). We want to thank you for sharing your thoughts on that with us so that we were able to elaborate this in our manuscript. 5. The other suggestion in the discussion that future studies might focus on purely nonverbal forms of communication also struck me as odd given that children’s initial failures really do seem to be so closely tied to their understanding of the experimenter’s speech act. There are no trivial responses or academic questions in a nonverbal stag hunt game, and it’s not terribly obvious what the one thing has to do with the other. This is not to say that looking into the relationship between these tasks and TB tasks would be fruitless (perhaps it is a better measure of recursive mindreading), but it seems a bit odd for this to occupy so much space in the discussion. On a related note, I found myself wondering whether this failure to detect a reliable correlation with more distally related pragmatic abilities might suggest a different future direction: finding a way to operationalize the specific type of pragmatic competency that underlies TB task performance, according to the pragmatic development account (e.g. understanding of trivial or academic questions). This would be in line with the broader theory Answer to Comment 5: Thank you very much for raising this crucial point. It made us realize that we did not introduce the relevant background and rational in sufficiently clear ways in the previous version. It was indeed misleading to present non-verbal measurements as an alternative measure for children’s verbal pragmatic language understanding. What we actually wanted to argue (and what we argue for now in much clearer ways) was the following hierarchical structure: non-literal language understanding might be one particular instance of recursive communitive intention ascription more generally, and that another option to test this more general capacity of recursive intention ascription that avoid inferences of individual language knowledge might be non-verbal Stag Hunt Scenarios. We now changed this in the manuscript and make clear that this is an alternative measure of recursive, higher-order mindreading (the same capacity that might also be applied in non-literal language understanding) and not an alternative measure of verbal non-literal language understanding itself (see page 46-47). Thank you very much for helping us to make this clearer for the reader. We hope that – taking these considerations and improvements into account – it is suitable to occupy some space for this in the discussion. Minor correction: 6. On p. 9, a contrast is drawn between children’s recursive mindreading abilities as measured in Liddle and Nettle and adults’ recursive mindreading abilities as measured by O’Grady et al. It is implied that between 10-11 years of age and adulthood, we go from chance on fourth-order recursive ToM tasks to success on 7th-order recursive ToM tasks. But O’Grady and colleagues use a very different, “implicit” measure of recursive mindreading, so these two results are not directly comparable. This should be corrected. Answer to Minor Comment/ Comment 6: Thank you for pointing out that this indeed needs to be reported in more nuanced ways. O’Grady and colleagues do actually use both explicit and implicit presentation of material (video clips of social interactions versus narrated story lines) and both explicit and implicit answer formats (video clips of social interactions versus read out recursive answer sentences). They conclude that adults are able to mindread to at least seven levels of embedding, both explicitly and implicitly. However, their data suggest that mindreading may be easier when stimuli are presented implicitly rather than explicitly (O’Grady et al., 2015). We added this information that O’Grady et al. (2015) in comparison to earlier reported studies (e.g., Liddle & Nettle, 2006) used both implicit and explicit stimuli and that children performed higher when stimuli were presented implicitly (see page 12). Reviewer #2 Summary The manuscript presents three experimental studies with 6- to 10-year-olds. The aim of these experiments is to assess whether children’s performance in a standard True-belief (TB) task can be predicted by their pragmatic skills, their recursive ToM skills, or more general recursive abilities. This investigation addresses an open question in developmental research: why do children’s performance in the TB task is characterized by a U-shaped developmental trajectory? More specifically, it focuses on the question of which cognitive skills determine the end of the U- shaped performance curve. Overall evaluation The studies are methodologically rigorous and well described, the statistical analyses are appropriately reported, and they certainly deserve to be published in PLOS ONE. I have some recommendations to revise the manuscript before publication, which I list below. These concern the Introduction and the General discussion, and more specifically, the presentation and discussion of the notion of ‘advanced pragmatic abilities’, and their alleged role in the experimental puzzle under discussion. Comments 1. First of all, I believe the introduction should include a more explicit description of the pragmatic analysis of children’s performance in the TB task, and the extent to which the analysis provided meshes well with the available experimental evidence in developmental pragmatics. The authors suggest that “once children develop the prerequisite ToM capacities, they develop an understanding for pragmatics” (149-150) Before that, they would “mostly use and interpret language literally” (139). This analysis offers an overly-simplified picture of pragmatic development. On the one hand, recent work in pragmatics shows that children can interpret language non-literally before the age at which they typically pass the standard FB task. For instance, work from Nausicaa Pouscoulous and colleagues has shown that children are able to understand metaphors already at the age of 3 (see, e.g., Pouscoulous & Tomasello, 2020). Similar results have been found when looking at metonymy understanding, which also displays an interesting U-shaped curve (see, e.g., Falkum, Recasens & Clark, 2017; Köder & Falkum, 2020). On the other hand, some pragmatic phenomena, such as irony understanding, emerge later in development (for a discussion, see Pexman, forthcoming). Given the relevance of pragmatic skills to the issue under discussion, I suggest reviewing in more detail the experimental literature on the development of pragmatics. Most importantly, the authors should give more prominence to the distinct developmental trajectories that characterize different pragmatic phenomena, without assimilating metaphor and irony. This issue is particularly salient in the section “Advanced pragmatics understanding” where these phenomena are discussed under the umbrella term “indirect speech acts”, which would “involve a mismatch of sentence and speaker meaning” (208-209). Answer to Comment 1: The picture of the pragmatic analysis presented in these lines is indeed simplified in order to present the general idea in a nutshell. However, we now explicitly point to this simplification and also point to evidence that contradicts the simplified version of the idea. Furthermore, we added a more explicit description of the pragmatic analysis of children’s performance in the TB task and its parallels to the understanding of non-literal language as well as a short overview of the available experimental evidence in developmental non-literal language comprehension in the more detailed part “Advanced Pragmatic Understanding” in the Introduction. In the General Discussion, we partly revise and extend this analysis as it became more fine-grained over the course of the studies. We appreciate that you provided us with several sources to improve our manuscript. 2. Furthermore, this section includes a very brief mention of the literature in clinical pragmatics and suggests that the pragmatic difficulties shown by individuals with ASD are fundamentally related to their ToM impairments. As this issue is vigorously debated in the current pragmatic literature, I suggest a more nuanced and informed discussion of the relevance of data from ASD to the issue under discussion (see, e.g., Andrés-Roqueta & Katsos, 2017; Kissine 2021; Mazzarella & Noveck, 2021). Answer to Comment 2: Thank you for this comment. We tried to present a more nuanced and up-to-date discussion of the current literature on neurotypical development of pragmatics in general (see answer to comment 1). With regard to pragmatics in clinical populations specifically, we tried in a first step to present an extensive overview of empirical findings as well as a summary of the ongoing debate in the current pragmatic literature. However, it then became apparent that this made the paragraph about pragmatics in clinical population very long in relation to the rest of the part about Advanced Pragmatic Understanding as the relationship between impairments in Theory of Mind and pragmatic understanding in clinical populations is not as conclusive as we presented it to be in the earlier version of our manuscript. We therefore now changed the manuscript in that we focus on the theoretical considerations and empirical findings on pragmatics development in neurotpyical populations only as it has yet to be clarified whether the relation between Theory of Mind abilities and pragmatics in neurotypical and neuroatypical development is comparable at all. (see page 10). 3. Finally, I think the authors should more convincingly discuss what kind of advanced pragmatic abilities are expected to be involved in the TB task and why, and, in light of this, better justify the choice of the selected measurements (metaphor and irony understanding). Answer to Comment 3: Thank you very much for this comment. Reviewer 1 mentions a similar point in their review (see comment 4 by Reviewer 1). We now review what kind of pragmatic abilities might be involved in the TB task and how these abilities might diverge from the pragmatic abilities involved in metaphor and irony understanding (see page 46). We also discuss in the new version of our manuscript that the development of these different underlying abilities might diverge as advanced pragmatic development cannot be necessarily seen as a unified phenomenon (see e.g., Geurts et al., 2019). It is thus subject to future work to examine both theoretically and empirically which kind of advanced pragmatic abilities are involved in the TB task. We therefore now consider in our new version of the manuscript a broad range of future studies including measures of non-literal speech acts as well as other measures of verbal advanced pragmatic understanding and of non-verbal communication. 4. These changes in the Introduction should also be reflected in a revision of the General discussion. Answer to Comment 4: We addressed the issues discussed above also in the General Discussion. Minor comments: 5. 126-129: It would be useful to include more references to empirical data showing that “academic question formats are difficult to grasp for young children”. Answer to Minor Comment/ Comment 5: Thank you. Unfortunately, the main empirical source usally cited in this context is an unpublished manuscript (Pemer, J., Leekam, S. R., & Wimmer, H. (1986). The insincerity of conservation questions: Children’s growing insensitivity to experimenters’ epistemic intentions). We therefore included some other references that show that the epistemic status of the interviewer can have an impact on the children’s answering behavior (see page 5). 6. Fig. 2 The visual animation seems very difficult to interpret, and it does not strike as an intuitive representation of the story “the video dilemma”. Could you please better justify the choice of this visual representation format and spell out more explicitly its appropriateness for the target age groups Answer to Minor Comment/ Comment 6: Thank you for this comment. We changed Figure 2 in order to show more clearly how the (indeed very long and complex) answer options were presented appropriately for the target age group (see page 19). Children heard the sentences presented slowly and sectional (in parts) while the according protagonists and their thoughts were represented visually on screen. Children were allowed to hear the answer options up to four times. Figure 2 represents a rather complex example (Level 4 mental state reasoning). Only older children were expected to solve this question. Before that point during the test session, children already heard the second- and third-order mental state questions as well as control questions in which they got used to the format as they heard stepwise longer and longer sentences each time. Typos • 354: Delete “were” - done • 717: Delete “that” - done 31 Mar 2022 How do Children Overcome Their Pragmatic Performance Problems in the True Belief Task? The Role of Advanced Pragmatics and Higher-order Theory of Mind PONE-D-21-34148R1 Dear Dr. Schidelko, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Jérôme Prado Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: (No Response) ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: (No Response) ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: (No Response) ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: (No Response) ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: (No Response) ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Evan Westra 5 Apr 2022 PONE-D-21-34148R1 How do Children Overcome Their Pragmatic Performance Problems in the True Belief Task? The Role of Advanced Pragmatics and Higher-order Theory of Mind Dear Dr. Schidelko: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Jérôme Prado Academic Editor PLOS ONE

20 in total

5. Beliefs about beliefs: representation and constraining function of wrong beliefs in young children's understanding of deception.

Authors: H Wimmer; J Perner
Journal: Cognition Date: 1983-01

6. The faculty of language: what's special about it?

Authors: Steven Pinker; Ray Jackendoff
Journal: Cognition Date: 2005-03

7. Exploring the role of conventionality in children's interpretation of ironic remarks.

Authors: Debra L Burnett
Journal: J Child Lang Date: 2014-12-11

8. An advanced test of theory of mind: understanding of story characters' thoughts and feelings by able autistic, mentally handicapped, and normal children and adults.

Authors: F G Happé
Journal: J Autism Dev Disord Date: 1994-04

9. Further development in social reasoning revealed in discourse irony understanding.

Authors: Eva Filippova; Janet Wilde Astington
Journal: Child Dev Date: 2008 Jan-Feb

10. Syntactic Recursion Facilitates and Working Memory Predicts Recursive Theory of Mind.

Authors: Burcu Arslan; Annette Hohenberger; Rineke Verbrugge
Journal: PLoS One Date: 2017-01-10 Impact factor: 3.240