Literature DB >> 23522006

Intra-individual variation of extreme response style in mixed-mode panel studies.

Abstract

It is well known that the self-report survey method suffers from many idiosyncratic biases, such as varying response styles due to different survey modes used. Using latent state-trait theory it is argued that response styles will also vary intra-individually, depending on the particular survey situation. In this study we examine intra-individual variation in extreme response style behavior (ERS) using mixed-mode survey panel data as a quasi-experimental setting. Data from the Irish National Election Study panel are used, which consists of repeated face-to-face and mail-back surveys. Latent transition analysis is used to detect switches in ERS, distinguishing 'stable' and 'volatile' respondents in terms of their response style. Overall, ERS is inflated in the intermediate mail component of the panel, whereas preliminary analyses suggest that low education and ideological extremity are drivers of that change. Results are discussed with regards to measurement errors in mixed-mode and longitudinal surveys.

Entities: Chemical Disease Species

Year: 2013 PMID： 23522006 PMCID： PMC3621025 DOI： 10.1016/j.ssresearch.2013.01.002

Source DB: PubMed Journal: Soc Sci Res ISSN： 0049-089X

Introduction

Researchers in the social sciences and related disciplines make extensive use of standardized self-report surveys as a quantitative data collection tool. Nevertheless, it is well known that this method is susceptible to many flaws or errors that jeopardize the validity of results. The present study focuses on systematic measurement bias coming from the respondent, namely idiosyncratic differences in how individuals make use of response scales in reporting their answers. This phenomenon in surveys is usually defined as ‘response styles’ in the literature (for an overview: Van Vaerenbergh and Thomas, in press). More precisely, this type of bias naturally applies to rating scales (e.g. Likert-type agree-disagree) that are typically used for non-factual subjective measures. This paper investigates extreme response style behavior (hereafter: ERS), a tendency to select endpoints of a response scale. Hence, ERS is assumed to be a systematic component in response patterns which, independent from the true attitude or assessment, biases observed scores. ERS has, for instance, gained growing attention as a source of nuisance in cross-cultural comparative survey research (e.g. Morren et al., 2012). Any idiosyncratic variation in that tendency is important, because extreme scores are used to make substantial inferences, such as ‘attitude extremity’ (Visser et al., 2006), ‘opinionation’ (Krosnick and Milburn, 1990), and ‘attitude polarization’ of issues in public opinion research (Baldassarri and Bearman, 2007). As a common source of variance ERS also artificially inflates correlations or item-factor loadings (Baumgartner and Steenkamp, 2001; Cheung and Rensvold, 2000). This study builds on a lively debate about whether response style behavior, in general, is a trait-like and stable individual feature or whether its manifestation is a state which primarily depends on external stimuli of the measurement method or situation (see, for example: Baumgartner and Steenkamp, 2001; Kieruj and Moors, 2013; Weijters, 2006). It is argued that both positions are valid. ERS is indeed a situation-dependent feature, though some individuals can be described as being rather stable (traited) and others as being volatile in this feature (instable, different states). A similar view on a segmented population can be found in latent state-trait (LST) theory (Eid and Langeheine, 2003) as well as in early controversies about the stability of attitudes (Converse, 1964). More precisely, we analyze how ERS varies with the particular data collection method or survey mode used, comparing interview (face-to-face survey) and self-administered (mail-back survey) surveys. A wide literature suggests that response behavior, in general, differs between survey methods, resulting in differential measurement bias (Bowling, 2005; de Leeuw, 2005; Revilla, 2010; Schwarz et al., 1991; Weijters et al., 2008). Turning to ERS and survey mode, most studies coming from independent samples find that extreme responses are triggered in interview or aural modes, especially using CATI (telephone) (Dillman et al., 2009; Groves and Kahn, 1979; Jordan et al., 1980; Martin et al., 1993; Ye et al., 2011). Conversely, an in-depth study of Weijters and colleagues (2008) reports that the level of ERS is similar in telephone and mail surveys, but lower in internet surveys. Though findings are not unequivocal, we argue that differences in ERS originate from distinctive situational aspects of the survey mode, whereas social desirability and ‘satisficing’ strategies (Krosnick, 1991) of respondents play a key role for the manifestation of ERS. It stands to reason that we would also find intra-individual variation in ERS for some respondents, conditional on the survey mode. As far as we know, this is the first study to look at this particular phenomenon. For this purpose we use the so called ‘mixed-mode panel’ design as a unique opportunity to study ERS. It is argued that this design allows us to examine ERS variation intra-individually and longitudinally in a quasi-experimental within-subject design, because we can observe identical individuals. As Vannieuwenhuyze and Loosveldt (in press) explain, it is usually difficult to disentangle the mode effect, which is comprised of (a) selection effects due to differences in respondent characteristics in different samples and (b) measurement effects or bias in different modes. So, the mixed-mode panel design represents a different approach towards evaluating mode effects (for other approaches: Vannieuwenhuyze and Loosveldt, in press), which tries to avoid confounding of mode and sample selection effects. However, some uncertainty regarding mode effects remains as we usually do not have an additional between-subject control group (a group with a different mixed-mode setting, with equal sample characteristics) alongside the within-subject design. Therefore we cannot prevent so called learning or carryover effects in the sense that having been surveyed using one mode affects how subjects behave in other modes. It is therefore mandatory that a mixed-mode panel design includes repeated survey modes to better deal with these issues. Besides, while mixed-mode panels studies have become more and more common in recent decades (Couper, 2011; de Leeuw, 2005), we still know very little about the implications of switching modes for potential measurement bias or reliability. In this paper we thus ask: do people change their level of ERS with the survey mode and, if so, why? It is important to note that strong variations of response style behavior with the measurement method would be fatal (see also Weijters et al., 2010a,b), since substantial change and artificial change in a variable can be confounded. Inconsistency implies that, besides random error inherent to survey questions, a kind of systematic bias in responses is introduced that is ‘transient’ from occasion to occasion (Le et al., 2009). For instance, the deleterious effect of volatile response behavior is supported by the finding that personality markers yield lower retest reliabilities when switching the survey mode in a panel study (e.g. Lang et al., 2011). This would imply that repeated measures in mixed-mode panels should be used with caution. This study, for the first time, aims at providing insights into intra-individual variation in response style behavior (ERS) that accompanies a mixed-mode panel study. We show how a latent class model, so called latent transition analysis (LTA), can be used to detect switches as well as stability in a person’s latent level of ERS. The contribution is thus twofold. This study will, first, contribute to a better understanding regarding the nature of survey response styles and the cognitive processes underlying the communication of self-reports. This has implications for studying ‘true change’ in attitudes as opposed to artificial changes. Second, it serves as a resource for applied researchers who work with mixed-mode data. For instance, the latent class model enables to identify respondents who are susceptible to vary their response behavior based on situational (mode) factors. The paper is structured as follows. We, first, outline the theoretical perspective on the manifestation of individual response styles using latent state-trait theory. Taking a comprehensive approach, we argue that it is useful to separate respondents that actually change ERS behavior with the measurement situation from those who are rather stable. Next, we propose expectations regarding situational aspects in surveys and their impact on ERS. These expectations are examined using unique data from the Irish National Election Study (INES) panel, which uses a mixed-mode design of repeated face-to-face and mail-back surveys. For this purpose we present a latent transition model which helps us to detect variation in the overall level of ERS while allowing us to separate ‘stable individuals’ and ‘variable individuals’. In an exploratory fashion, we also examine some individual covariates of ERS volatility. Finally, the results are summarized and discussed regarding implications for mixed-mode surveys and inherent measurement bias in longitudinal studies.

Perspectives on survey response styles

A large body of the applied survey research literature has been devoted to causes which are responsible for response style bias in public opinion surveys or psychometric tests (Van Vaerenbergh and Thomas, in press). As already mentioned, it is not clear-cut whether this behavior is, at least to a certain extent, a stable individual characteristic, even trait-like, or whether it is largely situation-dependent and ‘pops up’ uniquely at each measurement occasion. Therefore panel surveys are very well suited to examine these questions. We will now summarize the aforementioned arguments with regards to ERS to arrive at a research synthesis.

Stability vs. inconsistency of response styles: trait vs. state

A first major perspective, the disposition argument, conceives response styles as a ‘trait-like’ characteristic or an internal feature of the person (e.g. Couch and Keniston, 1960). According to common definitions, a basic feature of individual traits is temporal stability (Steyer et al., 1999). Supporting the argument, a strand of research finds evidence for considerable cross-time correlations of ERS (Bachman and O’Malley, 1984; Weijters et al., 2010a,b) as well as other response styles (Billiet and Davidov, 2008) when using equal measurement conditions. Similarly, research suggests that ERS patterns seem to endure throughout a questionnaire, across different question content (Weijters et al., 2010a,b) and different response scale formats (Kieruj and Moors, 2013). A second perspective, the situation argument, builds on the assumption that striking response patterns result primarily from external influences due to the measurement situation or stimuli (e.g. Schuman and Presser, 1981; Schwarz et al., 1991). This can be the survey mode or other formal features, such as the response scale design. In early attitude research this led researchers to assume that response styles are a very instable phenomenon (Hui and Triandis, 1985) or even non-existent as an internal feature (Rorer, 1965). However, a dynamic perspective seems more plausible. An individual’s level of ERS may take on different ‘states’ depending on the measurement situation. Essentially, this is what studies coming from (static) independent samples suggest.

The latent state-trait perspective

This study takes an encompassing approach to response styles. We follow latent state-trait (LST) theory (Steyer et al., 1999) to illustrate the disposition-situation controversy in understanding response style behavior. LST theory helps us to understand two important issues in this respect. On the one hand, an individual characteristic derived from a sample of individuals can be more or less stable across measurement occasions (reliability). For instance, the level of stress or test achievement of students might be highly dependent on the situation being measured (instable). On the other hand, particular individuals might differ in being stable or being volatile in a characteristic (also see Eid and Langeheine, 2003). The latter position is in line with attitude research focusing on differential stability of opinions in a population (Converse, 1964). Similarly, Hui and Triandis (1985, p. 260) share this view and state that ‘fluctuation rates of response sets vary across individuals’. Indeed, both perspectives are covered in the two main modeling strategies for longitudinal data, i.e. (1) variable-centered approaches that focus on cross-time correlations and trajectories in continuous variables and (2) person-oriented approaches that identify classes of respondents over time (Laursen and Hoff, 2006). We use the latter approach in order to identify groups of respondents who differ in their pattern of ERS scale usage. This not only allows us to examine the overall (sample) change in the level of ERS during the mixed-mode panel (Step I), but also to identify individuals that can be described as being stable in ERS as opposed to those who are volatile in ERS (Step II). In an exploratory fashion we will then examine some correlates of stability or variability (Step III). In line with LST theory, these factors will point out interactions between personal factors and situational factors (Steyer et al., 1999).

Cognitive aspects of different survey modes

In what follows, we briefly propose expectations on the impact of the survey situation on the manifestation of ERS. The rationale of this investigation is that, from the respondent’s point of view, types of survey modes differ considerably in their situational aspects. These situational aspects would elicit cognitive processes in the survey respondent when handling questions and reporting their answers (for this approach: Schwarz et al., 1991; Tourangeau et al., 2000). For reasons of simplicity we only refer to differences between interview and self-administered modes. While more has been said about survey mode in relation to social desirability or response order effects (e.g. Schwarz et al., 1991; Tourangeau et al., 2000), little is known about the causes of unequal manifestation of ERS. Based on three key situational aspects we link the survey mode with the manifestation of ERS. Following the work of Tourangeau et al. (2000) these are: (a) (im)personality of the situation, (b) cognitive burden, and (c) importance or legitimacy of a survey. As we will see, theoretical explanations might also by conflicting in their expectations. Whether an interviewer is present or not refers to stimuli which are evoked by the social relationship or social norms. This aspect has been studied extensively as a potential measurement bias in sensitive questions (Tourangeau et al., 2000). Regarding ERS, we assume that, independent of the direction of the true position, this stimulus results in avoiding extreme answers (low ERS) on attitude questions as a specific aspect of socially desirable behavior. The rationale is that some persons might want to present themselves in a better light by avoiding extreme opinions or very explicit positions in interview situations (see Kuncel and Tellegen, 2009).2 This is in line with the idea that social desirability interferes with the process of judgment in the total response process (Tourangeau et al., 2000). We thus expect that the ERS level is higher in self-administered waves of a panel than in interview survey waves (Expectation 1). Another approach coming from cognitive psychology focuses on ‘cognitive effort’ or ‘cognitive burden’ of the survey situation. Krosnick’s (1991) hypothesis on ‘satisficing’ in surveys suggests that some respondents make the task of reporting answers in surveys as easy as they can. Thus, they would only give ‘satisficing’ responses. Next, we present two expectations embedded in the satisficing theory, both of which assume that ERS is a consequence of diminished accuracy or superficial cognitive processing. More precisely, after having decided the direction of an opinion (in favor/against, agree/disagree) respondents would consider fewer response alternatives, which makes the choice of endpoints more likely. With regards to cognitive burden of the interview situation, higher pace in surveys conducted by interviewers might play an important role. Self-administration clearly slows down the pace, which gives respondents the opportunity to optimize their responses. This argument would support previous findings on higher ERS in telephone surveys, which are exceptionally prone to this problem (high pace). According to this line of arguments, we expect the ERS level to be lower in self-administered waves of a panel than in interview survey waves (Expectation 2). We also expect respondents to use satisficing strategies, that is, low motivation to optimize their survey responses due to lower importance or legitimacy of the survey situation. Since we assume that in-person interviews are clearly at an advantage with regards to perceived importance or legitimacy, self-administered surveys may be more susceptible to satisficing strategies (higher ERS). In a panel study this effect should apply beyond natural conditioning effects, which can result in decreasing motivation to respond accurately (Cantor, 2008). We expect the ERS level to be higher in self-administered waves than in interview survey waves (Expectation 3). To sum up, we arrive at different theoretical expectations with regards to the consequences of satisficing behavior in (independent) interview and self-administered surveys, whereas we have clear expectations concerning social desirability effects. Previous empirical findings are not unequivocal in this respect either. Hence, using the mixed-mode panel study will contribute to a better understanding of ERS in different survey situations.

Data and coding

Sample and panel design

In the following sections our expectations are examined using data from the INES (Irish National Election Study) panel, which ran from 2002 to 2007 (INES, 2008). The original sample is based on the eligible population in Ireland at 2002. Households were chosen at random and then a random respondent was selected within each household. The design of the INES panel is especially suitable for examining response behavior within a mixed-mode panel since it covers several waves and repetitive modes. In the 2002 study the initial post-election survey was conducted face-to-face (in-person) using computer assisted personal interviewing (CAPI). The panel component included separate self-completion mail follow-ups with the same sample in 2003, 2004 (after local and EP elections), and 2006. Each of the three mail surveys provided entry into a lottery as an incentive. After the 2007 national election the panel was completed with another face-to-face interview. In all n = 420 respondents completed all five waves with repeated modes in the panel (i.e. F2F-mail-mail-mail-F2F).3 With regards to issues of panel attrition (mortality), we find patterns that are consistent with those in previous studies (see Weisberg, 2005, p. 162). Younger individuals (40 years and below), people with lower formal education, and people with low interest in the topic (i.e. politics) show a higher likelihood of panel attrition. Conversely, gender, and ideological extremity on the left–right scale (as measured in a self-completion survey drop-off in 2002) had no impact on mortality (detailed results not presented here). Overall, panel attrition should not present a problem for the purposes of this study as we see the panel as a quasi-experimental setting rather than as a representative sample of the whole population. Moreover, we do not have any evidence that the variable studied (change in ERS) has an impact on panel participation.

Measures for ERS

Idiosyncratic survey response styles are defined by an overall pattern of ‘stylistic’ response category selection when answering survey questions. In line with previous research, we assume that the level of ERS is a latent characteristic which manifests itself in a different likelihood of selecting extreme scale categories across several items. According to Weijters (2006) there are several ways to measure response style behavior. To achieve higher validity and reliability in the assessment of a stylistic pattern in survey responses one should follow a simple rule: the more indicators and the more heterogeneous in content (low interitem correlations), the more reliable the measurement will be. However, in practice scholars are limited to measures available in a survey if a study has not been designed explicitly to account for response style phenomena. Furthermore, in order to guarantee comparability of the response style trait itself, one has to keep the stimuli, i.e. question content and design, constant. Though the questionnaires of the INES panel include a large number of attitude measures, only few items were included in all five waves of the panel (2002, 2003, 2004, 2006, and 2007). We intend to use as many repeated indicators as possible in order to correctly identify the response pattern associated with ERS. We use classical Likert-type questions (7-point A/D, 1 = strongly disagree, 7 = strongly agree) as well as forced choice format questions (11-point). We follow the insight that, as a personal style, ERS seems to occur regardless of the question format or scale length (Kieruj and Moors, 2013). We therefore selected five items (see the Appendix A for exact question wording) for all five waves (25 in total)4: Environmental threats (7-point A/D) (‘ENVI’). Limiting number of immigrants (7-point A/D) (‘IMMI’). British withdrawal from Northern Ireland (7-point A/D) (‘WITH’). Insist (code 0) on vs. abandon (code 10) the aim of a United Ireland (‘UNIT’). Ban abortion (code 0) vs. abortion should be freely available (code 10) (‘ABOR’). To construct the final ERS indicators we recode the 25 variables in the following way: 1 = choosing an extreme category and 0 = choosing any other category (including don’t know and left out answers in the mail survey).5 Hence, we ignore the direction of the attitude (positive or negative). Next, we reduce the information of the overall response patterns to identify respondents who are more likely to endorse or reject extreme categories on all items at each point of time.

Methodology and statistical models

The following analyses use the mixture modeling or latent class analysis (LCA) framework. This is for two reasons. First, LCA is capable of detecting the choice of specific scale categories in indicators assuming categorical measurement level of indicators. Correlation-based approaches like factor analysis lack this feature. Second, LCA is a person-centered method in terms of the intention to detect unobserved (latent) groups of respondents who exhibit similar characteristics or similar response patterns on items. Other scholars have used LCA methods to detect ERS in previous work using cross-sectional data (e.g. Austin et al., 2006; Eid and Rauber, 2000; Moors, 2003). With regards to longitudinal data the latent class concept can be extended to describe variation in ‘latent states’ (stages) over time (see LST-theory). This approach resembles the picture of an individual taking on different states in ERS at each measurement occasion. According to our expectations, some individuals will switch states of high or low ERS during the panel, depending on the different survey modes. As we will see, the mixture modeling approach also allows to separate stable and volatile respondents. Nevertheless, given the aim of this study the focus is on detection rather than the correction of systematic biases resulting from potential variation in response styles. Further steps in data analysis will be discussed in the final section of this paper.

The latent class model

The rationale and statistical background of LCA are outlined elsewhere in more detail (e.g. Hagenaars and McCutcheon, 2002; Lazarsfeld and Henry, 1968). Nevertheless, we will briefly sketch the main ideas to elaborate the model used here. As is the assumption for other latent variable models, such as factor analysis or item response theory, the basic idea is that a latent variable is responsible for the observed response patterns in manifest indicators. Indicators also differ in how well they distinguish between the classes. Within LCA both the indicators and the latent variable itself are considered to be categorical in nature (i.e. individuals belong to latent classes). Therefore the method is inherently a person-oriented approach, since it tries to assign individuals to categorical groups. The latent class structure is derived from response patterns so that indicators are locally independent, i.e. uncorrelated within each class, whereas the individual’s class assignment is probabilistic. More precisely, the probability of a person belonging to c(c = 1, 2, 3, … , C) finite classes in a latent variable S (here: response style) is computed. Thus, there are two main parameters to be estimated: (A) the probability of a person being in class c. These classes are, for instance, ‘moderate’ or ‘extreme’ individuals in our example and (B) the conditional probability of observing a particular response pattern i (e.g. choosing an extreme category) on item y(k = 1, 2, 3, … , K) when belonging to class c. This is the relation between indicators and the latent variable, that is, the estimated probability of choosing the extreme category when belonging to one of the classes. The probability π of an individual being in class c of the latent variable S is given by , whereas probabilities of being in one of the c latent classes of the latent response style variable S always sum to 1: The probability of response pattern i on the kth indicator y is conditional on belonging to class c of the latent variable S: The number of latent classes is usually determined theoretically or empirically (using model-based fit measures). Here, we decide to define two latent ERS classes/states (C = 2) in the latent response style variable S. Like in previous work, we intend to find ‘Extreme Responders’ as opposed to ‘Moderate Responders’ (e.g. Austin et al., 2006; Eid and Rauber, 2000). This will also keep interpretation simple and the model parsimonious.

Measurement invariance in longitudinal models

So far we have described the static LCA model. For longitudinal analyses it is mandatory to keep the actual meaning of the latent classes or latent states constant. That is, the number and structure of the classes should be the same across time (Nylund, 2007). In order to fulfill this premise, the conditional probabilities of the same (time-dependent) indicator, if belonging to class c, should be restricted to be equal over time (Collins and Lanza, 2010). In other words, we expect the conditional response pattern on item y to be the same at all t(t = 1, 2, 3, … , T) measurement occasions, given that a person belongs to class c. We therefore restrict response probabilities to be equal over time:

Step I: Latent transition analysis (LTA)

In what follows we present the longitudinal model, which is based on latent transition analysis (LTA) (e.g. Collins and Lanza, 2010). LTA represents a latent class model for stage-sequential processes, which is often used in developmental research. It will be used to detect switches in ERS during the panel waves that use different modes. The change in response style is modeled at the latent class level, which is then connected to observed indicators via a probabilistic model (see above). In other words, we are not concerned with changes in the scores of single variables, but changes in the whole response pattern. However, unlike other applications (e.g. cognitive development) we neither assume stable transitions over time (stable ‘growth’), nor that individuals remain in a state (e.g. higher ERS) once it is ‘achieved’. In LTA a latent class variable S, which is measured by k indicators each, is observed at t points of time in a panel. From the invariance assumption it follows that when response probabilities of class c are restricted to be equal over time, only the probabilities of class membership in S can vary. In doing so, we can model aggregate variation in the size of each class as well as individual transitions among them. Hence, an individual may run through different stages of ERS at different points of time. The initial class probability in the first wave of the panel is given by . In terms of a longitudinal study, LTA is most often based on a Markov chain model. Belonging to class/state c in S is conditional on the previous class membership or state c(−1) in the latent variable S(−1). More precisely, the model describes a ‘first-order process’, that is, the state at time t is only dependent on the state at time t − 1 (Langeheine and van de Pol, 2002, p. 313): This relation is also called ‘transition probability’ to define stability or change in states. Transitions are usually represented as the percentage change from S(−1) to S. A transition probability of 1.0 (100%) would indicate complete stability or no change in classes/states for all individuals.

Step II: Separating respondents who vary in their stage-sequential process

In a second step, we try to ‘unmix’ groups of respondents who exhibit a similar intra-individual development and thus a unique dynamic process. In other words, respondents in the sample may exhibit different initial class membership and they might change or remain in a state over time. We capture this aspect using the so called ‘latent mixed Markov model’ which defines a finite number of separate ‘chains’ of development (Langeheine and van de Pol, 2002). Within each chain respondents are supposed to share a common intra-individual development. The most prominent model, which will also be used here, is the so called ‘Mover–Stayer model’ (see Reinecke, 1999). It defines separate chains of individual development: being completely stable in ERS (probability of 1.0 of being in class c with transition probability of 1.0) and those varying their ERS class/state over time. We define a ‘Mover–Stayer’ variable (MS) which assigns respondents to these different transitions. For the purposes of this study we define the Mover–Stayer model as follows: ‘Movers’, a group of respondents varying in ERS, ‘stable Extreme Responders’, and ‘stable Moderate Responders’. We will test whether such a model is superior to contrasting models, such as complete stability in class membership over time.

Step III: Correlates of change and stability

In the third and last step, we examine individual correlates (x) of belonging to one of the groups in the Mover–Stayer variable, that is, variability in a person’s ERS state. We use three variables that can be linked to the susceptibility to mode effects. We include ‘ideological extremity’ to account for social desirability effects in the present survey. Ideologically extreme respondents might want to avoid extreme opinions on political issues in interview situations and switch response tendencies when filling out a questionnaire. Extremity is measured by folding over a classical 0–10 ideological left–right scale (rescaled to range 0–1, where 1 is extreme). In the 2002 survey left–right position was part of a self-completion drop-off, so we use the 2007 indicator for a face-to-face estimate. For a mail survey estimate we use a composite of the remaining available measurements (2003 and 2004) in the panel (Alpha = 0.53). We use education (in 2002) as a key proxy for different ‘cognitive abilities’, where lower education is expected to be associated with satisficing strategies (Kaminska et al., 2010; Narayan and Krosnick, 1996) that result in a stronger ERS tendency (Greenleaf, 1992; Meisenberg and Williams, 2008; Weijters et al., 2010a,b). We define three educational levels: 1 = ‘none’ and ‘completed primary’, 2 = ’junior/inter group or equivalent’ and ‘leaving certificate or equivalent’, 3 = ’diploma or certificate’ up to ‘university degree or equivalent’. To account for the basic motivation to participate in the INES panel we include ‘political interest’ (4-point ‘very interested’ up to ‘not at interested’ rescaled to range 0–1, where 1 is high interest), which is also a composite of two items measured in 2002 and 2007 (Alpha = 0.71). We also examine the impact of two exogenous variables which were found to be related to the level of ERS. ‘Age’ has been found to play a role in biases associated with satisficing. Research suggests that people with higher age tend to have higher ERS levels (Greenleaf, 1992; Meisenberg and Williams, 2008; Weijters et al., 2010a,b). To capture non-linear (u-shaped) effects we construct three groups (based on age in 2002): persons aged below 40 years, persons aged 40–59 years, and persons 60 years and above. Regarding other demographics we include ‘gender’, since there is inconclusive evidence on the direction of gender effects regarding the manifestation of ERS (Meisenberg and Williams, 2008; Weijters et al., 2010a,b). The figure below (Fig. 1) shows the whole model for five panel waves. Latent class variables S and MS are depicted as ellipses, manifest indicators y and x as squares. Membership in MS indicates assignment to classes and different transition probabilities (stable or volatile) in the LTA part.

Fig. 1

Graphical representation of full model.

All analyses are carried out using Mplus Version 6 (Muthén and Muthén, 1998–2010).6 We follow the principles of Nylund’s (2007) illustration, who applies LTA using Mplus. The MLR estimator (Maximum Likelihood with robust standard errors) and the EM-algorithm (expectation–maximization) are used for parameter estimation. The analyses are based on n = 420 respondents who participated in the whole panel study.

Results

Before we present estimation results in detail, the overall model fit of contrasting modeling strategies is evaluated. Table 1 presents the Log-Likelihood (LL) as well is the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), where relative better fit and parsimony of a model is indicated by lower values in AIC and BIC. We, first, fitted a model assuming entirely stable ERS tendencies for all individuals across all panel waves, that is, all respondents would remain in their respective class/state (model A). In comparison, AIC and BIC values do not support such a stationary model, so we opt for allowing intra-individual variation. Second, a model with free transition probabilities is estimated, where all individuals freely change their ERS state from occasion to occasion (model B). Third, we model the intended Mover–Stayer model which includes three separate chains (model C). In the following we only present results of the final model (C), the Mover–Stayer model with different transition probabilities. Though the model fit (BIC) is slightly worse than the free model (B), it is considerably more informative. The entropy (classification quality) for the latent class part of this model is 0.88 (close to 1), indicating good fit.

Table 1

Comparison of model fit measures for different ERS transition models.

Model	Param.	LL	AIC	BIC	n
A. No transition (all remain stable)	11	−5549	11,121	11,165	420
B. Free transition probabilities	19	−5429	10,896	10,973	420
C. M–S-model (with restrictions on transitions)	13	−5443	10,917	10,978	420

Note: All models present a 2-class solution of ERS.

In what follows we, first, present the results of the latent class model and overall change in ERS during the panel waves. Second, we show transition probabilities as well as intra-individual developmental patterns. Third, we will evaluate correlates of stability or variability in ERS.

Class-specific response probabilities

Next, we present response probabilities of each class, which, by definition, are held equal across all panel waves. Table 2 shows the estimated probability of providing an extreme answer conditional on class membership as well as the odds ratio (O.R.) of selecting the extreme category between the two classes.7 We identified two classes with significantly different response probabilities (styles) on all five items: one class (Class 1) being more moderate in their responses (‘Moderate Responders’) and one class (Class 2) that more often prefers the extreme categories (‘Extreme Responders’).8 For instance, the estimated probability of selecting an extreme category on item WITH is 6% for members of Class 1 (Moderate) and 43% for Class 2 (Extreme) (O.R. = 11.46). Though, some indicators (e.g. IMMI, WITH) make the contrast between classes more clear than others (e.g. ENVI).

Table 2

Conditional response probabilities for extreme categories based on the estimated model.

	Class 1		Class 2
Issue	τ	Prob.	τ	Prob.	O.R.	Δ Prob.
ENVI	1.88	0.13	0.97	0.27	2.49	−0.14⁎
IMMI	1.99	0.12	−0.55	0.64	12.67	−0.51⁎
WITH	2.74	0.06	0.30	0.43	11.46	−0.37⁎
UNIT	1.89	0.13	0.44	0.39	4.28	−0.26⁎
ABOR	1.26	0.22	−0.19	0.55	4.26	−0.33⁎

n = 420

Note: Table represents thresholds, respective probabilities, odds ratios between classes, and difference in probabilities.

Two-tailed significance p < 0.01.

The following graph (Fig. 2) gives an example how individual posterior (most likely) class membership translates into originally observed indicators. The graph shows response frequencies for individuals assigned to classes, using questions from the 2003 mail survey wave as an example. We clearly see that for Extreme Responders the extreme categories have the highest frequency, irrespective of the direction of an opinion. Again, we see that one indicator (ENVI) does do not strongly differentiate between the classes.

Fig. 2

Conditional response frequencies based on class membership (Mail survey 2003). (Note: n = 420; Do not know or no answer depicted on the left-hand side of the bar chart.)

To further guarantee that the latent class variable signals a general tendency in response behavior we further validated class assignment with an external measure (see also Kieruj and Moors, 2013). In a separate analysis we computed a simple sum-score of extreme responses on the remaining 7-point Likert-type items in the questionnaires (61 items in total), which were highly heterogeneous in content. The score was significantly related to initial class membership in the expected direction (p < 0.001) (detailed results not shown here). That is, our latent class variable seems to be congruent with an overall tendency to use extreme categories.

Class sizes and changing level of ERS

As already mentioned, in LTA only the class probability or class membership is supposed to vary over time. We already demonstrated that a model where all respondents remain in the same class/state turned out being inappropriate. If response behavior differs between the panel waves, the respective class sizes would change at the aggregate. According to our expectations, we would find more variation of ERS between different modes, but less within the same mode. This is in fact what we find in our analyses. The following Graph (Fig. 3) shows class sizes in each of the five waves based on the most likely class membership, which uniquely assigns respondents to the one of the classes with the highest class membership probability.

Fig. 3

Class sizes across panel waves according to most likely class membership (n = 420).

The distribution of ERS patterns in the two face-to-face waves (S1, S5) indicates that the group of Moderate Responders is clearly the larger one. However, the share is not perfectly equal, which suggests that distribution in the two surveys differs in some respect. In the three mail survey waves (S2, S3, S4), however, Extreme Responders and Moderate Responders are almost equally large in size. In other words, we clearly see that, at the aggregate, the syndrome of ERS is significantly more pronounced during the self-administered mail waves. Since the level of ERS changes with mode but shows stability between the second and the fourth wave, we expect this effect to be a mode effect rather than panel conditioning. The latter states that respondents show learning effects or become more certain in their attitudes because of participating in the panel (Cantor, 2008).

Switches in latent ERS state and transition probabilities

So far we have seen considerable changes in the latent ERS classes at the aggregate level. Yet, we lack information on intra-individual variation in response behavior from one occasion to another. Next, we present the results of latent transitions from each wave to the following (simple Markov chain) for the whole sample. Fig. 4 shows transition probabilities, that is, the amount of change or stability in class membership that occurs across panel waves.

Fig. 4

Transition probabilities according to most likely class membership (n = 420). (Note: Figure indicates the percentage of respondents moving to a particular class in the subsequent panel wave.)

Most strikingly, we find that variation in ERS occurs almost exclusively when the survey mode changes, whereas class membership remains highly stable during the three mail panel waves. Fig. 4 shows that a considerable share (38%) of initially Moderate Responders changes to more extreme response tendencies in the second wave (S1 to S2). Also, none of the respondents had switched from extreme style to moderate style between the first face-to-face and the subsequent mail wave, so we fixed this parameter in the estimation to zero. Between the three mail survey waves (from S2 to S3 and from S3 to S4) we find no significant transitions, that is, clear evidence for high intra-individual stability. So, aggregate level distributions are actually a product of intra-individual stability of the ERS classes within the mail surveys. Finally, we see that, from the last mail-survey to the final face-to-face survey, a fraction of all Extreme Responders in S4 again switches into moderate responding in S5 (48%). Also, this pattern provides evidence for actual mode effects being at work.

Separating stable and volatile individuals: The Mover–Stayer variable

The overall pattern of transitions still obscures intra-individual development, since transition probabilities for the whole sample also include those who are stable in some way. It follows that only volatile individuals account for the aggregate change we have seen in Fig. 3. Hence, it is our aim to distinguish between truly volatile respondents and stable respondents. For this purpose we separated groups of respondents according to different parallel Markov chains (Mover–Stayer variable). Table 3 shows chain sizes and patterns of class membership. Besides the two stable groups, we only find two characteristic patterns among Movers when using the most likely class membership (model C). That is, we are able to demonstrate very clear patterns of intra-individual variation or stability in ERS.

Table 3

Transition patterns of different chains in the Mover–Stayer variable.

				Class/state pattern
Chain	Name	Chain size (%)		1-F2F	2-Mail	3-Mail	4-Mail	5-F2F
1	Stable moderate	51		M	M	M	M	M
2	Stable extreme	16		E	E	E	E	E
3	Movers	33	23%	M	E	E	E	M
			10%	M	E	E	E	E

n = 420		100

Note: M = moderate, E = extreme.

The chain sizes are as follows: 51% stable Moderate Responders, 16% stable Extreme Responders, and 33% Movers. Regarding the stability of response styles, we would thus argue that about two thirds of the respondents are quite stable in their level of ERS. Conversely, the Mover pattern shows a common change from moderate to extreme from the first to the second survey wave. In addition, almost one out of four respondents (23%) actually changes his or her response style strictly in line with the survey mode. It is safe to say that this is a strong indication for a true mode effect among these individuals.

Correlates of change or stability

Finally, we present results of bivariate relationships between class membership in the Mover–Stayer variable and some individual covariates. We ask whether there are certain individual factors which are associated with intra-individual stability or volatility in response behavior. Table 4 presents cross tables and mean comparisons. The results replicate general results on ERS differences from previous studies. Stable Extreme Responders are relatively less well educated, they report more extreme scores on the left-right ideological scale, and are somewhat older than respondents with a moderate survey response style.

Table 4

Means and distributions of covariates within latent classes.

		Stable moderate	Stable extreme	Movers	n
L–R extremity	Mail	0.29 (0.02)	0.38 (0.04)	0.43 (0.03)	324
	F2F	0.27 (0.02)	0.40 (0.05)	0.36 (0.03)	380

Education	Low	10%	21%	28%	74
	Medium	42%	48%	47%	188
	High	47%	30%	25%	158

Political interest	Average	0.64 (0.02)	0.65 (0.03)	0.65 (0.02)	420

Age	<40	30%	14%	22%	103
	40–59	47%	55%	45%	200
	60+	23%	32%	33%	116

Gender	Female	48%	48%	45%	198
	Male	52%	52%	55%	222

Note: Entries indicate means with S.E. in brackets or column percentages. For coding of variables, see text. Sample size n refers to valid cases used for the analyses.

With regards to the group that changes the level of ERS, we find some evidence that this group is ideologically even more extreme. In particular, we see that left-right extremity scores among Movers are significantly inflated in the mail survey average where we would expect more honest responding (Wilcoxon signed-rank test, p = 0.04). For other respondents the left-right extremity is not significantly different in the two modes (p ⩾ 0.10). Table 4 also shows that Movers are among the lowest educated (chi2(4) = 30.09, p < 0.01). Whereas higher extremity would support the reduced social desirability expectation, the latter point would support expectations concerning satisficing effects among lower educated individuals. We also find that the three groups differ in age (chi2(4) = 10.10, p = 0.04), whereas stable extreme responding in middle adulthood could be related to a higher level of attitude importance and attitude certainty at this age (Visser and Krosnick, 1998). However, due to small sample sizes we were not able to establish statistically significant differences between stable Extreme Responders and respondents with volatile response behavior (Movers). Finally, we do not find any difference with regards to political interest (Kruskal–Wallis Test, p = 0.98) or gender (chi2(2) = 0.44, p = 0.80).

Discussion and conclusion

The aim of this paper was to investigate intra-individual stability and variation in extreme response style behavior (ERS) as a source of measurement bias within mixed-mode panel studies. A latent transition model provided evidence that a considerable share of respondents turns to pronounced ERS behavior during the mail surveys and therefore more individuals show the characteristic pattern of ERS in these panel waves. Basically, there are two plausible explanations for these findings. On the one hand, the expression of extreme opinions on sociopolitical issues may increase due to the anonymity provided by self-administered surveys. On the other hand, lower accuracy or ‘careless’ responding may result from low importance or fatigue in the mail survey situation, that is, satisficing strategies. These theoretical expectations are indeed supported by our analysis of individual covariates. Ideological extremity plays a role in ‘admitting’ extreme responses, but only in self-administered mail surveys. Social desirability effects in the interview survey may hence be present when ‘being watched’. This may be especially true for emotional political issues. At the same time, we find support for the argument that lower education moderates response biases in surveys, i.e. satisficing behavior. We find that more often lower educated respondents tend to switch their response behavior across modes. It therefore remains a matter of debate whether responses in the self-administered (mail) surveys are simply more honest or less ‘optimized’ in the sense of response quality. The findings of this study have implications for the methodology of the self-report survey method and for substantial inferences coming from mixed-mode survey data. First, it is argued that variation in response style would jeopardize the mutual comparability of different data types (modes), since responses are affected by a systematic source other than what the items were specifically designed to measure (Paulhus, 1991, p. 17). This is the definition of systematic measurement bias, which is ‘transient’ or instable over different occasions of measurement (Le et al., 2009). Second, regarding the debate on the nature of individual response styles in surveys it seems promising that respondents are heterogeneous groups. Some respondents are quite stable, while others exhibit higher volatility in that behavior. This is especially important if we were able to determine the individual’s propensity to vary his or her style. A latent class variable could be used as a ‘marker variable’ in further analyses, for instance. Therefore, the person-oriented (latent class) approach provides a more detailed picture than correlation-oriented approaches (e.g. Billiet and Davidov, 2008; Weijters et al., 2010a,b), since stability (reliability) measures of response styles are inherently a characteristic of the particular population (see Alwin, 2007). Third, we found that extremity on rating scales is, at least to some extent, independent of extremity of attitudes toward an issue or object. This has substantial implications for attitude strength measures or public opinion research. For instance, our results suggest that people are more ‘polarized’ in their attitudes and more ‘opinionated’ on several issues when there are no campaigns or elections. However, this contradicts what we would assume from the electoral cycle. Nevertheless, this study has several limitations. We used a quasi-experimental design where survey mode is not a strict treatment condition. Different types of survey modes (situational aspects) could be explored as an experimental treatment of intra-individual response style variation in interaction with other individual characteristics. Moreover, it is worth mentioning that the survey data used were not explicitly designed for the assessment of response style behavior. Other factors which might be associated with a switch in response behavior are not covered here. For instance, we lack information on more elaborated measures of social desirability (e.g. Paulhus, 1991) or certain personality profiles (e.g. Kieruj and Moors, 2013). Also, we cannot neglect the fact that other panel designs, e.g. initial recruitment with a mail survey, may produce different results in response behavior. Only a fully randomized study with a mixed-mode-recruitment and mixed-mode-switch panel design would be able to fully address these issues. While the findings on individual covariates might be limited to the particular survey modes used here, it stands to reason that findings of mail surveys translate to the area of self-administered web surveys, for instance. Finally, we have some suggestions for further research. In general, theories on cognitive aspects in surveys and changing level of response biases in surveys could be the scope of future research using variants of the mixed-mode panel design. While this study was devoted to detecting rather than correcting systematic bias introduced by varying response behavior in longitudinal surveys, scholars should also endeavor to develop and apply correction methods or statistical remedies. Still, very few approaches are available for longitudinal data, where varying systematic error or method bias might play a role. The work of Baumgartner and Steenkamp (2006) is among the first to provide a structural equation modeling approach which decomposes variance components in order to correct latent means of a construct. In sum, any model that allows for separating true score change, method variance, transient systematic error and random measurement error in longitudinal studies is to be preferred.

6 in total

Review 1. THE GREAT RESPONSE-STYLE MYTH.

Authors: L G RORER
Journal: Psychol Bull Date: 1965-03 Impact factor: 17.737

2. Yeasayers and naysayers: agreeing response set as a personality variable.

Authors: A COUCH; K KENISTON
Journal: J Abnorm Soc Psychol Date: 1960-03

3. The stability of individual response styles.

Authors: Bert Weijters; Maggie Geuens; Niels Schillewaert
Journal: Psychol Methods Date: 2010-03

Review 4. Mode of questionnaire administration can have serious effects on data quality.

Authors: Ann Bowling
Journal: J Public Health (Oxf) Date: 2005-05-03 Impact factor: 2.341

5. Development of attitude strength over the life cycle: surge and decline.

Authors: P S Visser; J A Krosnick
Journal: J Pers Soc Psychol Date: 1998-12

6. Short assessment of the Big Five: robust across survey methods except telephone interviewing.

Authors: Frieder R Lang; Dennis John; Oliver Lüdtke; Jürgen Schupp; Gert G Wagner
Journal: Behav Res Methods Date: 2011-06