Florian Goller1, Donghoon Lee2, Ulrich Ansorge1, Soonja Choi3. 1. Faculty of Psychology, University of Vienna, Austria. 2. Department of Psychology, Pusan National University, Republic of Korea. 3. Department of Linguistics and Asian/Middle-Eastern Languages, San Diego State University, United States of America.
Abstract
Languages differ in how they categorize spatial relations: While German differentiates between containment (in) and support (auf) with distinct spatial words-(a) den Kuli IN die Kappe stecken ("put pen in cap"); (b) die Kappe AUF den Kuli stecken ("put cap on pen")-Korean uses a single spatial word (kkita) collapsing (a) and (b) into one semantic category, particularly when the spatial enclosure is tight-fit. Korean uses a different word (i.e., netha) for loose-fits (e.g., apple in bowl). We tested whether these differences influence the attention of the speaker. In a crosslinguistic study, we compared native German speakers with native Korean speakers. Participants rated the similarity of two successive video clips of several scenes where two objects were joined or nested (either in a tight or loose manner). The rating data show that Korean speakers base their rating of similarity more on tight- versus loose-fit, whereas German speakers base their rating more on containment versus support (in vs. auf). Throughout the experiment, we also measured the participants' eye movements. Korean speakers looked equally long at the moving Figure object and at the stationary Ground object, whereas German speakers were more biased to look at the Ground object. Additionally, Korean speakers also looked more at the region where the two objects touched than did German speakers. We discuss our data in the light of crosslinguistic semantics and the extent of their influence on spatial cognition and perception.
Languages differ in how they categorize spatial relations: While German differentiates between containment (in) and support (auf) with distinct spatial words-(a) den Kuli IN die Kappe stecken ("put pen in cap"); (b) die Kappe AUF den Kuli stecken ("put cap on pen")-Korean uses a single spatial word (kkita) collapsing (a) and (b) into one semantic category, particularly when the spatial enclosure is tight-fit. Korean uses a different word (i.e., netha) for loose-fits (e.g., apple in bowl). We tested whether these differences influence the attention of the speaker. In a crosslinguistic study, we compared native German speakers with native Korean speakers. Participants rated the similarity of two successive video clips of several scenes where two objects were joined or nested (either in a tight or loose manner). The rating data show that Korean speakers base their rating of similarity more on tight- versus loose-fit, whereas German speakers base their rating more on containment versus support (in vs. auf). Throughout the experiment, we also measured the participants' eye movements. Korean speakers looked equally long at the moving Figure object and at the stationary Ground object, whereas German speakers were more biased to look at the Ground object. Additionally, Korean speakers also looked more at the region where the two objects touched than did German speakers. We discuss our data in the light of crosslinguistic semantics and the extent of their influence on spatial cognition and perception.
Does our everyday spatial language influence our perception and cognition? More
specifically, does language-specific semantic categorization of spatial relations
affect our nonverbal categorization and visual attention to objects? In this study,
we investigate these questions comparing German and Korean, two languages that
differ significantly in the way they categorize spatial relations. Here, we study
for the first time if these language differences also lead to differences in how
attention is deployed to Figure versus Ground
objects in action recognition.Objects can relate to one another in different ways. As shown in Figure 1, an object can be contained, supported
(on a horizontal or vertical surface), attached or covered by another, or it can fit
with the other tightly or loosely. However, languages differ greatly and
significantly in the way they classify these relations (Bowerman, 2007; Choi &
Hattrup, 2012), not only across unrelated languages (Levinson, Meira, & The Language and Cognition
Group, 2003) but also among related languages, for instance, Germanic
languages (Majid, Jordan, & Dunn, 2015).
For example, Majid et al. (2015) reported
that, among the twelve Germanic languages that they investigated, German and English
belong to different language clusters in the way they categorize various types of
spatial relation in static scenes. They further reported that the degree of
crosslinguistic differences in the spatial domain is significantly more extensive
than in other semantic domains, such as body part terms or color terms (Majid et al., 2015). Therefore, space is a good
testing ground for investigating the relationship between language and cognition.
Figure 1.
Examples of the different spatial relations used in the current study. From
top to bottom, example scenes for loose-on, tight-on, tight-in, and loose-in
are shown. Left (in orange) is the Korean categorization, right (in purple)
the German categorization.
Examples of the different spatial relations used in the current study. From
top to bottom, example scenes for loose-on, tight-on, tight-in, and loose-in
are shown. Left (in orange) is the Korean categorization, right (in purple)
the German categorization.The debate on whether language shapes perception and cognition has continued over
centuries and has become a core matter in cognitive science, particularly in recent
years (Gentner & Goldin-Meadow, 2003;
Gleitman & Papafragou, 2013; Wolff & Holmes, 2011). In this debate,
crosslinguistic comparisons in the spatial domain have provided critical data. On
the one hand, studies have reported data supporting a version of Whorf’s
(1956) hypothesis, namely that language significantly influences the way we perceive
and categorize the world (Boroditsky, Fuhrman, &
McCormick, 2011; Levinson, Kita, Haun,
& Rasch, 2002; Pederson et al.,
1998). On the other hand, studies have supported a modular
theory claiming that cognition is universal, independent of language,
and thus is unaffected by language-specific grammar (Gleitman & Papafragou, 2013; Li
& Gleitman, 2002; Munnich, Landau,
& Dosher, 2001). According to the latter view, any influence of
language on perceptual or cognitive tasks is due to the mediation of language during
the tasks, which can be suppressed by a concurrent linguistic activity (e.g., verbal
interference). Therefore, effects of language on cognition are thought to be rather
shallow (as they happen only online while carrying out a specific task in a specific
condition) and do not permeate the underlying universal cognitive organization
(Gleitman & Papafragou, 2013; Landau, Dessalegn, & Goldberg, 2010).However, the depth of language influence may depend on the semantic domain. Recent
studies (Athanasopoulos & Bylund, 2013;
Choi & Hattrup, 2012; Lupyan, 2009; Thierry, Athanasopoulos, Wiggett, Dering, & Kuipers, 2009) that have
examined perception and cognition (e.g., eye-movements, memory, similarity judgment)
in different domains showed that language-effects are more automatized and
internalized than the modularist view may claim. In particular, Choi and Hattrup
(2012) reported that, in a spatial
categorization task, English and Korean speakers showed significant, linguistically
relevant differences regardless of a “language-interference”
condition, where verbal thinking was actively suppressed. The study suggests that at
least in the domain of spatial categorization, language has permeated and become an
integral part of nonverbal cognition. Another element to consider in the language
and cognition debate is that within a semantic domain (e.g., spatial categorization)
both universal perceptual/cognitive tendencies and
language-specific components may contribute to its organization, such that
language-specifics affect some parts of a semantic domain more than others. For
example, in spatial categorization, languages may categorize containment relations
crosslinguistically similarly while they categorize support relations more diversely
(Choi & Hattrup, 2012; Levinson et al., 2003; Yun & Choi, under revision). In sum, recent studies on
language and cognition have revealed that the interaction between the two is highly
complex.In investigating nonlinguistic behaviors related to spatial perception and
cognition, researchers have studied participants’ nonverbal categorizations
and eye movements. To assess categorization, studies have examined
participants’ intuitive judgments about how similar spatial scenes/events
are, either by forced choice or by rating degree of similarity (Choi & Hattrup, 2012; Engemann, Hendriks, Hickmann, Soroli, & Vincent, 2015;
Gennari, Sloman, Malt, & Fitch,
2002). Studies have also measured participants’ eye movements to
specific areas of interest that are linguistically relevant (Papafragou, Hulbert, & Trueswell, 2008; Soroli, Hickmann, Hendriks, Engemann, & Vincent,
2015). Note that making a judgment or rating a degree of similarity
involves controlled processes, determined by the instructions, whereas eye movements
are not controlled by the instructions alone (cf. Flecken, Gerwien, Carroll, & von Stutterheim, 2015; Papafragou et al., 2008; Van Bergen & Flecken, 2017). Examining both types of
behavior thus measures two partly independent ways of how language influences
spatial cognition —that is, it measures language effects more exhaustively.In the present study, we measured both similarity ratings and eye movements to
assess the relationship between language and nonverbal spatial categorization.
Previously, Choi and Hattrup (2012) used a
triad design where participants first saw one target event in the middle of the
screen for a few seconds. Then, the next screen appeared with two choice events
presented simultaneously, one on the left and one on the right side of the screen.
Participants were asked to choose which of these two choice events was more similar
to the target event, thus engaged in a two-alternative forced-choice (2AFC)
similarity judgment task. Similarly, participants in the current study compared two
events directly to each other (rather than choosing one over the other in a 2AFC
task) and indicated how similar they are on a range from 1 through to 9.
Additionally, we also measured eye movements to linguistically-relevant areas of the
spatial events while participants engaged in this similarity-rating task. In
particular, we examined the allocation of attention to the Figure and the Ground
objects as well as the contact area between them (see below). Overall, drawing on
Choi and Hattrup’s (2012) results on
English and Korean, we expected to see a significant language-effect on the
similarity-rating and the eye movements.In the following, we first present critical differences in the spatial semantics
between German and Korean and then present hypotheses about possible influences on
spatial perception and cognition.
Language-Specific Spatial Categorization in German and Korean
German and Korean differ in classifying dynamic spatial events, such as putting
an object into/onto another (see Figure 1).
They also differ in the morphology used to categorize spatial relations:
prepositions/particles in German and verbs in Korean. In German (similar to
English), a major distinction in spatial categorization involves whether an
entity is contained (geben in, “put in”) or supported (see Figure 1). Support relations are typically
expressed with geben auf, “put on”, whether they
involve horizontal support, attachment, or covering, forming an abstract
category of “support”. In contrast to German, in Korean, a major distinction
is made based on the degree of fit between linguistically defined Figure and
Ground objects. In linguistics, a “Figure object is a moving or
conceptually movable point whose paths or site is (…) variable
(…),” while the “Ground object is a reference-point, having
a stationary setting within a reference-frame, with respect to (…) the
figure” (Talmy, 1978, p. 627). In
particular, when a Figure object fits tightly with the Ground object (e.g., put
rings tightly on poles; put pegs tightly into matching holes) Korean speakers
use the same expression, kkita (or kkiwu-ta,
with the causative suffix -wu), “fit
tightly/interlock,”
collapsing across containment or support into one semantic category (see Figure 1; Bowerman & Choi, 2003; Yun &
Choi, under revision). When the relation does not involve a
tight-fit, a distinction is made between loose containment
(nehta) and loose support (nohta). Yet
again, Korean differs from German: The category of nehta (a
word generally referring to loose containment) includes loose encirclement as
well, for example, a big ring on a thin pole. Thus, the
division between containment and support is again blurred in the two loose-fit
categories in Korean (nehta and nohta). Figure 2 summarizes how German and Korean
semantically categorize the four types of spatial relation, loose-in, tight-in,
loose-on, and tight-on, and shows the primary difference between the two
languages: While German categorizes in terms of containment and support, thereby
distinguishing between tight-in and tight-on, Korean collapses the two tight-fit
relations into one semantic category, kkita.
Figure 2.
Depicted is an abstract representation of the category memberships of or
similarities between different video depictions of diverse spatial
relations that were used in the present study. The major point of
interest is that spatial relations in one and the same video that are
similar according to Korean language (enclosed by the blue circle) fall
into separate categories in German (red circles).
Depicted is an abstract representation of the category memberships of or
similarities between different video depictions of diverse spatial
relations that were used in the present study. The major point of
interest is that spatial relations in one and the same video that are
similar according to Korean language (enclosed by the blue circle) fall
into separate categories in German (red circles).It is important to note that when tight-fit is involved, Korean speakers use the
verb kkita to denote tight-fitness between the Figure and
Ground objects, disregarding the topological spatial relations between them, for
example, containment or support. In contrast, German speakers consistently
encode the topological relation between the Figure and the Ground, regardless of
the degree of fit. Sentences 1A-2B (see Table
1) illustrate these crosslinguistic differences. Consider events of
joining a pen cap and a pen: One can either move the cap to the pen or move the
pen to the cap. Of course, one can also move both objects to join them, but the
current study does not concern symmetric movement. In German, to express that
one moves a pen cap (as a Figure) to cover a pen, one encodes the spatial
relation with AUF (“on”) as in Sentence 1A, but when one moves a
pen to insert it into a pen cap, one expresses it with IN (“in”)
as in Sentence 1B. But Korean typically uses the same spatial verb
kkita, regardless of which object is moving (Sentences 2A
and 2B), to denote the fitness between Figure and Ground.
Table 1.
Examples of crosslinguistic differences between German and
Korean
Note. SUBJ – Subject marker, OBJ = Object marker, LOC =
Locative marker, DECL = Declarative ending marker.Spatial semantics are essential to our everyday language: We frequently
communicate with others about where things (Figures) are relative to a reference
point (Ground). Developmentally, infants explore the physical properties of
spatial relations (e.g., containment, support, tight-fit) virtually from the
beginning of life and start categorizing and generalizing them from the
preverbal period (Casasola & Cohen,
2002; Hespos & Baillargeon,
2001). Not surprisingly then, children produce spatial words from the
one-word stage onwards and use them according to their language’s
specific semantics (Choi, McDonough, Bowerman,
& Mandler, 1999). Furthermore, there is much evidence that
language-specific semantics influence or even guide nonlinguistic spatial
categorization from an early period (Casasola,
Cohen, & Chiarello, 2003; Choi,
2006; McDonough, Choi, & Mandler,
2003) into adulthood (Choi &
Hattrup, 2012).Given the fundamental nature of spatial cognition and early acquisition of
spatial expressions, based on previous findings, we hypothesized that the
critical differences summarized above between German and Korean—in
spatial semantic categorization and (non-) distinction between linguistically
defined Figure and Ground—have significant effects on speakers’
nonverbal spatial categorizations particularly in those behaviors that are
directly related to the linguistic expressions in question, namely similarity
ratings and in eye movements to objects in dynamic spatial events.
Specific Predictions
If language influences spatial perception/cognition, we predicted the
following results: In similarity ratings, we predicted that compared to
Korean speakers, German speakers give a higher rating for the
tight-in/loose-in pair and for the tight-on/loose-on pair but a lower rating
for the tight-in/tight-on pair. We expected no significant differences
between the two language groups for the tight-on/loose-in pair, a pair of
relations that differ in both the tight-loose and the containment-support
dimensions.For eye movement behaviors, we expected Korean speakers and German speakers
to differ in (a) the amount of looking to Figure versus Ground objects, and
(b) areas of contact between Figure and Ground. We also expected that these
crosslinguistic differences are particularly pronounced in tight-fit events
compared to loose-fit events (as it is the tight-fit domain which the two
languages categorize differently, see Figure
1), such that Korean speakers will attend to Figure and Ground
equally often to ascertain the tight-fitness between the two objects,
whereas German speakers may bias their attention to the Ground because the
Ground is more likely to provide critical information about the topological
relation: A concave container as Ground will feature a containment relation
whereas a non-container Ground (e.g., flat or convex surface) will result in
a support relation. With respect to contact areas, Koreans should attend to
them much more than German speakers do, again particularly for tight-fit
events.
Methods
Participants
We tested 15 participants (nine female, six male,
Mage = 21.09; SDage
= 1.45) that were recruited among students of the Pusan National University
(Republic of Korea) and 15 participants (ten female, five male,
Mage = 23.36; SDage
= 3.48) that were recruited among students of the University of Vienna
(Austria). The sample size was based on an a-priori power calculation using
G*Power (Faul, Erdfelder, Buchner, & Lang,
2009), assuming a moderate effect size and a statistical power of
80%. This power analysis was based on a design with one two-step
between-participants factor and one two-step within-participant factor. Based on
the literature, an interaction between language and fitness (tight-fit versus
loose-fit) was reasonable to assume. We only conducted one general power
analysis for all data analyses reported in this paper.All participants were native speakers of their respective language and were
raised monolingual. Furthermore, all participants were naïve with respect
to the research hypothesis, had normal or corrected to normal visual acuity, and
received partial course credit. We adhered to the Declaration of Helsinki and to
the ethical guidelines for human subject testing of the respective universities.
Informed consent was obtained from all participants and, together with a
language survey, a full debriefing followed the experiment. From the
Korean-speaking sample, one participant was excluded due to excessive eye blinks
which resulted in a data loss of more than 75% for that participant. One
additional participant from the Korean sample and one participant from the
German sample were excluded because the language survey indicated a bilingual
upbringing. The final sample consisted, therefore, of 13 Korean speakers and 14
German speakers.
Apparatus
In the Pusan and the Vienna laboratories, we tested our participants under very
similar conditions. All videos were displayed on a 19 in. monitor at a
resolution of 1,024 × 768 pixels and a vertical refresh rate of 60 Hz.
Viewing distance was kept stable at 64 cm by chin and forehead rests. Eye
movements of the participants’ dominant eye were recorded using an
EyeLink 1000 Desktop Mount eye-tracker (SR Research Ltd., Kanata, Ontario,
Canada) at a sampling rate of 1,000 Hz and an average accuracy of 0.15° of
visual angle. The eye-tracker was calibrated using a 13-point calibration
procedure. Prior to each trial, a drift check was performed, requiring
participants to fixate on a centrally presented target circle. Recalibrations
were performed if recorded fixation gaze average was outside a 4° radius of
the pre-trial drift check target circle. The experimental procedure was
implemented in Experiment Builder (SR Research Ltd., Kanata, Ontario, Canada),
and the experiment was run on a computer under the Windows operating system.
Manual responses were recorded as button presses with the right index finger on
a keyboard.
Stimuli
Among all possible pairs of combinations involving the four spatial relations
(tight-in, tight-on, loose-in, and loose-on), we selected four pairs (see
Table 2). More specifically, we
focused on three pairs (1-3 in Table 2) for which the two languages differ
in semantic categorization and included one pair for which the two languages
do not differ. In Pair 4, both languages distinguish the two relations
(tight-on vs. loose-in) as they are maximally different in that they share
neither tight-fit nor containment (or support) features.
Table 2.
Semantic Categories of the Stimuli Pairs as a Function of
Language
Stimuli Pair
Semantic category in German
Semantic categoryin Korean
1. tight-in / tight-on
different(IN/AUF)
same (kkita)
2. tight-in / loose-in
same(IN)
different (kkita/netha)
3. tight-on / loose-on
same (AUF)
different (kkita/notha)
4. tight-on / loose in
different(AUF/IN)
different (kkita/netha)
We created a set of 32 videos (eight videos for each type of relation), each
lasting for 4 s. We made multiple videos with different objects for each of
the four spatial relations (tight-in, tight-on, loose-in, loose-on), each
video containing a simple manual action, such as putting playing cards on a
table (loose-on) or putting corks in bottles (tight-in, see Appendix 1). All
actions were performed by a single female performer. The performer was
dressed in black and filmed in front of a black background. In all videos,
only her hands were visible.Each video consisted of three Figure objects (e.g., three cards) and one or
more larger Ground objects (e.g., a table). All videos started with the
first Figure already placed in or on the Ground (e.g., from the start, one
sees the first card on the table). The performer put the second and the
third Figures serially on or in the Ground over the course of the video.
This redundancy in the spatial action in the videos (i.e., having three
Figures) was intended to help the participants to fully perceive the action
and the relation involved in the spatial event in question, which would be
critical for performance in the later rating task. The performer’s
hand holding an object came into view from the top of the video screen. On
average, the hand with the second Figure appeared on the screen about 100 ms
after video onset, and the hand with the third Figure appeared on the screen
about 1 s after video onset. Due to the diversity of objects used in our
videos, these timespans varied between the videos.The videos were shot using a Canon EOS 550D at a frame rate of 50 frames/s.
The lighting conditions were kept constant for all videos. We decided to use
grayscale videos to minimize the effects of salient colors that varied
across videos and are known to attract attention in an automatic manner
(Itti, Koch, & Niebuhr, 1998;
Theeuwes, 1991, 1992). We feared that too many of such
salience influences could have equated the eye-movement behaviors of our
participants so much as to potentially mask all language-specific
differences.
Procedure
All instructions were given in the native language of the participant, that
is, Korean for Korean participants and German for Austrian participants.
Each trial started with a central fixation dot that was used for the
eye-tracker drift check (see the Apparatus section). Afterwards,
participants were shown two videos in succession. The video pairs were
determined in advance to make sure that each spatial relation is equally
often compared to the other spatial relations. The presentation-order of the
respective videos in the pair (first versus second video) was, however,
random and counterbalanced across all participants within their language
group. The two videos were separated by a central fixation dot that was
shown for 2 s. After the second video, a rating scale, ranging from 1 to 9,
was presented on the screen. Participants were instructed to rate the
similarity of the two videos they just saw. Importantly, participants were
not told on what specific features or dimensions they should rate the
videos. They were encouraged to give an intuitive and quick rating. After
participants gave their rating, the next trial started. The overall
experiment lasted for about 40 min, including preparation, instructions, and
debriefing of the participant. After the main experiment, participants
filled out a survey to confirm their language background and, most
importantly, whether they were raised monolingual.
Eye-Tracking Coding and Data Processing
Eye-tracking samples were time-locked to the onset of each video. Since we
tracked the eyes with 1,000 Hz, we had a possible maximum of 4,000 samples per
stimulus. However, we had to exclude all samples that were recorded during eye
blinks. Furthermore, we excluded all samples that were recorded during saccades.
Using the SR Research algorithm, saccades were identified as a change in the
recorded gaze direction of more than 0.15°, with an eye movement velocity
above 30°/s, and an acceleration exceeding 8,000°/s². Overall, we
had to exclude 17.27% of all samples, leaving us with an average of about 3,308
samples per video and participant to analyze.We analyzed the time (i.e., the percentage of samples) that participants spent
looking at the Figure and the Ground. In line with the linguistic definition,
the Figure was always defined as the moving object, while the Ground was
stationary. Figure and Ground objects were hand-coded as interest areas (or
regions of interest) separately for each frame of each video. The Figure was
always in the foreground especially if it moved over the Ground. The only
exceptions are loose-in events where a concave container could partly obstruct
the view of a Figure that was put into it. Additionally, the hands and the
background were never part of the Figure or the Ground. See Figure 3 (left side) for an illustration.
Figure 3.
Mean normalized ratings (y axis) as a function of
spatial relation pairs (x axis) and language (separate
bars: German speakers in dark grey and Korean speakers in light grey).
The numbers on the bottom of each bar, indicate the mean for the
respective condition. Error bars represent the SEM.
Mean normalized ratings (y axis) as a function of
spatial relation pairs (x axis) and language (separate
bars: German speakers in dark grey and Korean speakers in light grey).
The numbers on the bottom of each bar, indicate the mean for the
respective condition. Error bars represent the SEM.The Ground was always bigger than the Figures (both measured in pixels),
t(31) = 4.26, p < .001,
d = 1.06. This result is based on the point in time where
all three figures were already present in the videos. To ensure that the Ground
was consistently larger than the Figure, the size of the Figures was subjected
to an analysis of variance (ANOVA) with the between-participants variables
Fitness (tight; loose) and Topological Relation (IN; ON). No significant effects
were found, all Fs < 2.67, all ps >
.133, indicating that the size of the Figures did not vary significantly across
the different conditions. The same analysis on the Grounds rendered the same
result, all Fs < 0.55, all ps >
.249.As in each video the Figure objects were put in, on, or around a Ground object,
separate from the Figure-Ground analysis, we also coded the Figure-Ground
contact areas. This coding was also based on every frame of the videos. See
Figure 3 (right side) for an
illustration. By the end of the video, there were three contact areas present.
The contact area was drawn around the immediate area where Figure and Ground
joined or touched (such as the part of a container where an object that was
inserted into it, see also Figure 3). The
size of the contact areas varied, but this variation was equal across all
conditions. This was ensured by an ANOVA of the size of the contact areas with
the between-participants Variables Fitness (tight; loose) and Topological
Relation (IN; ON) which yielded no significant results, all Fs
< 1.15, all ps > .201. Half of the defined contact area
was part of the Figure and the other half was part of the Ground. The definition
of contact areas was quite straightforward for tight-fit events because there is
a clearly defined, visible touching area. As before, loose-in events were a bit
problematic because the Ground partly obstructed the view of the Figure that was
put into it. In this case, the contact area only covered the visible part of the
Figure.
Results
Similarity Rating
To make the rating data more comparable between the different language groups, we
first normalized the data separately for each participant. Each individual
rating was recalculated as (V − min V)/(max V − min V), where V
represents the value of the rating in the original data set. This method allowed
us to have ratings with different means and SDs but equal ranges. For an
illustration of the results, see Figure
4.
Figure 4.
Examples of the analysis of Figure (green) and the Ground (red), left
side, and contact area (right side in blue). Figure was always defined
as the moving object, while Ground was stationary. The hands and the
black background were excluded from both coding methods.
Examples of the analysis of Figure (green) and the Ground (red), left
side, and contact area (right side in blue). Figure was always defined
as the moving object, while Ground was stationary. The hands and the
black background were excluded from both coding methods.The mean normalized ratings per participant and pair were subjected to a mixed
ANOVA, with the within-participant variable Pair (tight-in/loose-in;
tight-in/tight-on; tight-on/loose-in; tight-on/loose-on) and the
between-participants variable Language (German; Korean). If the Mauchly test
indicated that the assumption of sphericity was violated, p
values were adjusted with the Greenhouse-Geisser correction. Note that the
similarity ratings are based on pairs of video stimuli, not individual stimuli.
First of all, we found no main effect of language, F(1, 25) =
1.25, p = .228, ηp2 = .06,
indicating that Korean and German speakers did not differ regarding their
average similarly ratings across different pairs. However, there was a main
effect of pair, F(3, 75) = 7.29, p = .001,
ηp2 = .23, and an interaction between pair and
language, F(3, 75) = 8.02, p = .001,
ηp2 = .24. The tight-in/loose-in pair was rated
as more similar by German speakers compared to Korean speakers,
t(25) = 3.31, p = .003, d
= 1.27. An analogous result was found for the tight-on/loose-on pair,
t(25) = 2.15, p = .042, d
= .83. Only the tight-in/tight-on pair was rated as more similar by Korean
speakers compared to German speakers, t(25) = 2.76,
p = .011, d = 1.06. As expected, there was
no significant difference in the tight-on/loose-in pair, t(25)
= 0.13, p > .249, d = .05.We also checked whether there were significant differences between similarity
ratings of each pair. Pairwise Bonferroni corrected comparisons were performed,
separately for each language group. Korean speakers showed a significant
difference between the pair tight-in/tight-on when compared to all other pairs
(all ps < .007). No other differences were found for Korean
speakers (all non-significant ps > .249). In contrast, for
German speakers, we found significant differences between tight-on/loose-in and
tight-in/loose-in (p = .032) as well as tight-on/loose-in and
tight-on/loose-on (ps < .013). No other differences were
found for German speakers (all nonsignificant ps >
.249).
Eye-tracking Data
Looking time at Figure versus Ground
From all eye-tracking samples, 71.80% were on either Figure or Ground. From
these data, we computed the proportion of samples (which corresponds to the
proportion of viewing time) that was directed to the Ground. The results are
illustrated in Figure 5. All
proportions were arcsine transformed to approximate homogeneity of the
variances. The proportions of samples directed to the Ground were subjected
to an ANOVA with the within-participant variables Fitness (tight; loose) and
Topological Relation (IN; ON) and the between-participants variable Language
(German; Korean). If the Mauchly test indicated that the assumption of
sphericity was violated, p values were adjusted with the
Greenhouse-Geisser correction. Note that, unlike the similarity ratings, the
analysis of the eye tracking data was based on individual stimuli and not on
pairs of stimuli.
Figure 5.
Mean proportion of viewing time directed to the Ground
(y axis) as a function of spatial relation
pairs (x axis) and language (separate bars). The
numbers on the bottom of each bar indicate the mean for the
respective condition in percent. Error bars represent the
SEM.
Mean proportion of viewing time directed to the Ground
(y axis) as a function of spatial relation
pairs (x axis) and language (separate bars). The
numbers on the bottom of each bar indicate the mean for the
respective condition in percent. Error bars represent the
SEM.The ANOVA yielded significant main effects of language, F(1,
25) = 11.71, p = .002, ηp2 =
.32, and topological relation, F(1, 25) = 72.77,
p < .001, ηp2 = .74.
These main effects, as well as the interactions between language and
fitness, F(1, 25) = 67.18, p < .001,
ηp2 = .73, and fitness and topological
relation, F(1, 25) = 6.20, p = .020,
ηp2 = .20, are best explained by resolving
the also significant interaction of all factors of this ANOVA,
F(1, 25) = 4.75, p = .039,
ηp2 = .16. Since this paper mainly focuses
on crosslinguistic differences, we resolved this three-way interaction based
on between-languages comparisons. We conducted such an analysis with
separate ANOVAs for loose-fit and tight-fit events. For the loose-fit
events, we found no significant results, all Fs < 2.47,
all ps > .129, indicating no crosslinguistic
differences. For the tight-fit events, we found a significant main effect of
language, F(1, 25) = 41.04, p < .001,
ηp2 = .62, showing that German speakers
looked significantly more often at the Ground (68.65%) than Korean speakers
did (49.03%). The interaction between language and topological relation
showed a non-significant numerical trend, F(1, 25) = 3.77,
p = .063, ηp2 = .13. Korean
speakers showed no significant difference between IN and ON events,
t(25) = −1.24, p = .237,
d = −0.49. The same was true for German
speakers, t(25) = 1.58, p = .139,
d = 0.60. No main effect of topological relation was
found for the tight-fit events, F(1, 25) = 0.02,
p > .249, ηp2 < .01.
In sum, these results indicate that in tight-fit events German speakers were
more biased towards the Ground than to the Figure while Korean speakers
distributed their viewing time more equally between Figure and Ground.For the sake of completeness, we also analyzed whether each of the two
language groups differed across the steps of the factors Topological
Relation and Ritness. Therefore, we conducted separate ANOVAs for Korean and
German speakers with the within-participant factors of Topological Relation
and Fitness. Korean speakers showed an interaction between topological
relation and fitness, F(1, 12) = 9.68, p =
.009, ηp2 = .45. No main effects were found,
both Fs < .09, both ps > .249. In
the IN relation, they looked slightly more at the Ground in loose-fit
(51.04%) compared to tight-fit events (47.41%), t(12) =
2.52, p = .027, d = 0.99. In the ON
relation, we found the opposite: a smaller proportion of viewing time to the
Ground in loose-fit (46.36%) than in tight-fit events (50.66%),
t(12) = −2.26, p = .043,
d = −0.89. German speakers, in contrast, showed
only a main effect of fitness, F(1, 13) = 105.00,
p < .001, ηp2 = .89,
resulting from a higher proportion of viewing time to the Ground in
loose-fit (51.80%) than in tight-fit events (68.65%). No other effects were
found for the German speakers, F < 1.28,
p > .249 in all cases.We also checked post hoc whether the video presentation order (first video or
second video of the pair) had an influence on gaze behavior. It was
reasonable to assume that the participants’ behavior might differ
between the first and the second video. In the first video, participants may
just freely look at the videos, while in the second video they may actively
compare it to the first video. Analyzing the data separately for the first
and second video may provide us with additional information that is not
captured by an analysis collapsed over the video presentation order.
Therefore, we repeated the main analysis separately for the first video and
for the second video that was presented. The results were essentially the
same, meaning that the crucial interaction between language, topological
relation and fitness was significant in both cases, F(1,
25) = 8.37, p = .008, ηp2 =
.25, for the first video and F(1, 25) = 6.08,
p = .004, ηp2 = .20, for
the second video.
Looking time at contact area
This analysis was based on the same data as the Figure-Ground analysis,
meaning that these two analyses are not independent. We consider this
analysis as both complementary to and more fine-grained than the
Figure-Ground analysis because it concentrates on an important area for
identifying the spatial relation. We computed the percentage of viewing time
of the whole video (4 s) that was directed to the contact area(s). Mean
arcsine transformed percentages were subjected to an ANOVA, with the
within-participant variables Fitness (tight-fit; loose-fit) and Topological
Relation (IN; ON), and the between-participants variable Language (German;
Korean). If the Mauchly test indicated that the assumption of sphericity was
violated, p values were adjusted with the
Greenhouse-Geisser correction. The results are illustrated in Figure 6. We found a significant
interaction between language and fitness, F(1, 25) = 24.82,
p < .001, ηp2 = .50,
indicating that Korean speakers looked less into the contact area in
loose-fit events (16.47%) than German speakers (20.16%),
t(25) = −2.27, p = .032,
d = −0.87. In tight-fit events, in contrast,
Korean speakers looked more at the contact area (20.92%) than German
speakers (15.67%), t(25) = 4.26, p <
.001, d = 1.64. Furthermore, we found an interaction
between topological relation and fitness, F(1, 25) = 5.31,
p = .030, ηp2 = .18. Only
in the tight-fit events, a difference between IN and ON relations was
present (16.84% vs. 19.56%), t(26) = −3.54,
p = .002, d = −0.96, but not in
the loose-fit events (20.26% vs. 16.50%), t(26) = 1.42,
p = .169, d = 0.39.
Figure 6.
Mean proportion of viewing time directed to the contact area
(y axis) as a function of spatial relation
pairs (x axis) and language (separate bars). The
numbers on the bottom of each bar indicate the mean for the
respective condition in percent. Error bars represent the
SEM.
Mean proportion of viewing time directed to the contact area
(y axis) as a function of spatial relation
pairs (x axis) and language (separate bars). The
numbers on the bottom of each bar indicate the mean for the
respective condition in percent. Error bars represent the
SEM.As before, we also conducted separate ANOVAs for Korean and German speakers
with the within-participant factors Topological Relation and Fitness. Korean
speakers yielded a significant main effect of topological relation,
F(1, 12) = 9.36, p = .010,
ηp2 = .44, as well as an interaction between
topological relation and fitness, F(1, 12) = 7.91,
p = .016, ηp2 = .40. No
main effect of fitness was found, F(1, 12) = 3.02,
p = .108, ηp2 = .20. For
loose-fit events, Korean speakers looked more at the contact area in IN
(19.81%) than in ON (13.12%) events, t(12) = −2.44,
p = .031, d = −0.96. For
tight-fit events, this effect was numerically yet non-significantly reversed
(19.92% vs. 21.92%), t(12) = 2.10, p =
.058, d = 0.82. German speakers, in contrast, yielded only
a main effect of fitness, F(1, 13) = 17.27,
p = .001, ηp2 = .57,
indicating a higher proportion of viewing time on the contact area in
loose-fit (20.16%) than tight-fit (15.67%) events. No other results were
found for the German speakers (F < 0.89,
p > .249 in all cases).As in the Figure and Ground analysis, we checked whether the video
presentation order (first video or second video of the pair) led to
differential effects. Separate analysis for the first video and the second
video that was presented yielded essentially the same results as above. The
crucial interaction between language and fitness was significant for the
first video, F(1, 25) = 30.02, p <
.001, ηp2 = .55, and for the second video,
F(1, 25) = 11.43, p = .002,
ηp2 = .31.
General Discussion
We have examined possible influences of language-specific semantic categorizations of
spatial relations on three types of nonverbal behavior: similarity ratings, visual
attention to Figure and Ground, and amount of looking time to contact areas between
Figure and Ground objects. The results confirmed our overall hypothesis that
language-specific semantic categorization has a significant impact on these
behaviors.As predicted, differences in similarity ratings between German and Korean speakers
corresponded to the differences in semantic categorization between the two
languages: Korean speakers perceived tight-relations (tight-in and tight-on) to be
significantly more similar to each other than did German speakers. In contrast,
German speakers perceived the two types of containment (tight-in and loose-in) and
the two types of support (tight-on and loose-on) to be significantly more similar
than did Korean speakers. In other words, the two language groups perceived the
degree of similarity along the dimension delineated by their language-specific
semantics.In this study, we also examined possible relationships between spatial semantics and
visual attention to Figure and Ground. As predicted, Korean speakers’ looking
behavior was significantly different from German speakers’ behavior
particularly in relation to tight-fit events (i.e., tight-in and tight-on events for
which the two languages differ in their semantic categorization): Korean speakers
spent similar amounts of time looking at Figure and Ground whereas German speakers
looked at the Ground more than the Figure. To decipher the resulting topological
relationship (i.e., containment or support), which is relevant to the categorization
in German, the Ground gives more information than the Figure: A concave container as
Ground will result in a containment relation whereas a non-container Ground (e.g.,
flat or convex surface) will result in a support relation. It is interesting that
the longer looking time to the Ground by German speakers is restricted to tight-fit
events only. For loose-fit events, German speakers did not look more at the Ground
than at the Figure. Attending to Ground may be particularly necessary for tight-fit
events since in these events Figure and Ground interlock with each other and thus
the contour of individual objects is not salient. In contrast, these speakers do not
need to focus so much on Ground when viewing a loose-fit event because the identity
of the Ground is readily detectable.There is another linguistic element (besides the German spatial prepositional
system) that may have influenced German speakers to allocate their attention more to
the Figure in loose events in comparison to tight-fit events: In German, a
distinction is typically made between horizontal (liegen/legen) and
vertical (stehen/stellen) orientation of a Figure object relative
to the Ground. Such linguistic distinction may have promoted German speakers to
attend to the Figure object (cf. Van Bergen &
Flecken, 2017). This may be the case in particular in loose-fit events
because either orientation is possible when an object is loosely put on a surface
(e.g., stellen/legen die Flasche auf den Tisch—“make stand/lay down a
bottle on the table”). To
resolve between the two possibilities, a study could be conducted that
systematically contrasts between fitness (tight vs. loose) and posture (vertical vs.
horizontal).The results of looking times to contact areas also showed significant differences
between the two languages. In tight-fit events, Korean speakers spent more time
looking at contact areas than did German speakers. This difference implicates that
contact areas between Figure and Ground are important for Korean speakers for
assessing tight-fitness. By contrast, such an assessment is less important for
speakers of German, probably because all they need to identify is whether the Figure
goes into or onto Ground, which can be achieved by looking at the Ground, as
discussed above. In loose-fit events, overall, Korean speakers looked less at the
contact areas than German speakers. It is unclear how to interpret these data. For
loose-in events, both groups looked at the contact area rather extensively. It is
possible that speakers of both groups followed the Figure until it went to the
bottom of the container or support (i.e., the contact area). For loose-on events,
while German speakers showed a strong tendency to follow with their eye gaze until
the end point of the Figure’s trajectory (where the Figure touches the Ground
surface), Korean speakers did not. Perhaps Korean speakers are less interested in
continuing their eye gaze until the end point of motion, because the corresponding
verb in Korean nohta, “put loosely on surface,” (see Figure 1) also has the meaning of “release
x from hand grip.” Given the latter meaning of nohta, Korean speakers may
just have attended to the Figure being released from the performer’s hand. We
also mentioned that the definition of the contact area is rather difficult and less
strict for loose-fit events. Therefore, the results for the contact area in
loose-fit events should be taken with a grain of salt.Overall, our eye-movement data have shown an interaction between language and spatial
relation: Crosslinguistic differences on nonlinguistic behaviors were significantly
more pronounced for tight-fit events than for loose-fit events. Interestingly, this
result corresponds to a recent study by Yun and Choi (under revision) on English and Korean, which reports greater
crosslinguistic differences in semantic categorization for tight-fit events than for
loose-fit events. Future studies need to examine the extent of this décalage
(i.e., higher degree of language-specificity for tight-fit relations than for
loose-fit relations) in other languages and explore possible cognitive implications
of the phenomenon.This brings us to the limitations of the current study. Our participants performed
the similarity rating task in a silent environment. The study did not involve an
interference condition (e.g., repeating nonsense syllables), which would have
hindered or minimized possible verbally supported thinking during the task. However,
studies that juxtaposed silent and interference conditions during a nonverbal task
have reported conflicting results in terms of possible differences between the two
types of conditions and the type of interference that would effectively suppress
verbally supported thinking. For example, studies have reported that articulatory
tasks, such as counting numbers or repeating syllables, suppress verbal thinking
(i.e., show no impact of language on nonverbal tasks) while other tasks, such as
tapping, do not lead to the same result (cf. Gennari
et al., 2002; Trueswell & Papafragou,
2010). It should be pointed out that most of these studies investigated
the semantic domain of motion expressions in different languages-that is, whether a
language highlights the path of motion (e.g., into, out
of) or the manner of motion (e.g., walk,
run) in its grammar (but see Roberson & Davidoff, 2000; Winawer, Witthoft, Frank, Wu, Wade, &
Boroditsky, 2008, for studies on color; Meilinger
& Bülthoff, 2013, for a study on spatial memory). Interestingly,
in the domain of spatial categorization Choi and Hattrup (2012), who tested nonverbal similarity judgments in English and
Korean speakers, found a language effect in both silent and
interference (i.e., repeating syllables) conditions, and thus found no differences
between the two conditions. The present study, which has examined spatial
categorization in German and Korean speakers, should be extended further and include
differential conditions to examine whether the relationship between language and
cognition/perception differs across different semantic domains. Additionally, an
interference task could also answer the question of whether our results are
restricted to (nonconscious) verbal thinking or whether they extend to a deeper and
more general level of influence of language on spatial perception. However, such
experiments are beyond the scope of the present research, as we aimed to explore how
spatial relations might influence nonverbal categorization and visual attention to
objects involved in spatial events.Additionally, the time course with which participants deploy their attention to
Figure and Ground might be also of interest for future studies. In the current
study, we defined all of the three Figures as one area of interest. A more
fine-grained analysis could define the different Figures (and Grounds as well as the
contact areas) as different interest areas. By doing so, one could answer the
question at what point in time the crosslinguistic differences in the viewing
behavior become apparent.Last, one further extension of our experiment needs to be discussed. Our results are
obtained from a similarity rating task. We did not instruct participants to base
their rating specifically on the spatial relations but to give an intuitive rating.
However, participants might have picked up on spatial relation as an implicit rating
dimension as it was the only dimension that was consistently present in all videos.
All other perceivable dimensions (such as size or shape of the objects) varied
randomly and across the videos and therefore may have not been a feasible basis for
a rating. As a result, we may have pushed the effects of spatial language on
perception. More compelling tests in the future could use a similarity rating task
with objects or actions allowing more than one classification to see if a
linguistically marked classification influences the ratings even where other obvious
but not crosslinguistically marked features (e.g., object colors, action directions)
would invite alternative categorizations of the objects. In addition, a control
condition, in which no crosslinguistically marked actions or objects are used in a
similarity rating task could be employed to confirm that in such a condition no
differences between the Korean and German language exist. However, note that the
rating task of the present study already contained one such condition, namely, the
comparison between tight-on and loose-in relations (i.e., Pair 4 in Table 2). These
two relations are different both in terms of fitness (tight vs. loose) and of
topological relation (ON vs. IN). Thus, the two relations should be categorized as
being different by both Korean and German speakers, a prediction that was supported
by the much weaker language-dependent behavioral differences in this condition.In summary, the present study has shown that speakers of German and Korean diverge
significantly in nonverbal categorization and attentional behaviors in
correspondence to the semantic differences between the two languages. More
generally, the present study has shown that the spatial categorization we use every
day has a significant impact on our nonverbal behaviors that are directly relevant
to it—the way we nonverbally categorize spatial relations and the kinds of
things we pay attention to in a spatial event. To that extent, the present study
supports the Whorfian hypothesis (1956). Importantly, the present study has also
revealed that the language effect on nonverbal behaviors, specifically eye movement
behaviors, varies across subdomains: The effect occurred most prominently for
tight-fit relations, for which the two languages differed critically in their
semantic categorization. In comparison, nonverbal behaviors for loose-fit relations
did not generate significant crosslinguistic differences. As mentioned earlier, this
may reflect a higher degree of similarity in the way languages categorize loose-fit
relations than for tight-fit relations.As discussed earlier, studies have reported both universal cognitive/perceptual
tendencies and language-particular components in the way languages categorize the
semantic domain of space (Choi & Hattrup,
2012; Levinson et al., 2003; Yun & Choi, under revision). In particular,
Yun and Choi (under revision) have proposed
greater crosslinguistic differences in semantic categorization for tight-fit events
than for loose-fit events. The present study coheres with this proposal in that
crosslinguistic differences did not occur across the board in nonverbal behaviors,
but rather in the subdomain of tight-fit relations where language seems to be the
principle guide for categorization (Choi &
Hattrup, 2012). Thus, there is a complex interaction between
language-specific semantics and cognition/perception. However, we limit our claim on
the specific nature of interaction to the domain of space, and in particular the
domain of spatial categorization. To understand the relationship between language
and cognition in other domains, an in-depth analysis of the semantics of the target
languages in those domains should be conducted hand in hand with systematic
investigation of relevant cognitive and perceptual behaviors.