Literature DB >> 35600510

A multidimensional pairwise comparison model for heterogeneous perceptions with an application to modelling the perceived truthfulness of public statements on COVID-19.

Abstract

Pairwise comparison models are an important type of latent attribute measurement model with broad applications in the social and behavioural sciences. Current pairwise comparison models are typically unidimensional. The existing multidimensional pairwise comparison models tend to be difficult to interpret and they are unable to identify groups of raters that share the same rater-specific parameters. To fill this gap, we propose a new multidimensional pairwise comparison model with enhanced interpretability which explicitly models how object attributes on different dimensions are differentially perceived by raters. Moreover, we add a Dirichlet process prior on rater-specific parameters which allows us to flexibly cluster raters into groups with similar perceptual orientations. We conduct simulation studies to show that the new model is able to recover the true latent variable values from the observed binary choice data. We use the new model to analyse original survey data regarding the perceived truthfulness of statements on COVID-19 collected in the summer of 2020. By leveraging the strengths of the new model, we find that the partisanship of the speaker and the partisanship of the respondent account for the majority of the variation in perceived truthfulness, with statements made by co-partisans being viewed as more truthful.

Entities: Chemical

Keywords: COVID‐19; Dirichlet process mixture model; pairwise comparison model; public opinion

Year: 2022 PMID： 35600510 PMCID： PMC9115520 DOI： 10.1111/rssa.12810

Source DB: PubMed Journal: J R Stat Soc Ser A Stat Soc ISSN： 0964-1998 Impact factor: 2.175

INTRODUCTION

The ability to measure latent attributes, as well as differences in how humans perceive these latent attributes, is important for many research areas in the social and behavioural sciences. Examples include research on: perceptions of skin colour (Massey & Martin, 2003), perceptions of racial stereotypicality (Eberhardt et al., 2006), cross‐cultural differences in values (Oishi et al., 2005), perceptions of the compactness of legislative districts (Kaufman et al., 2021), understandings of political freedom and political efficacy across cultures (King et al., 2004), and evaluations of biomedical images (Phelps et al., 2015), among many others. Likert‐type scales are one commonly used approach to measure such latent attributes and concepts. However, it is widely understood that human raters may use Likert‐type scales differently and this can make it difficult to interpret the resulting estimates (see, for instance, Bachman & O'Malley, 1984; Brady, 1985; Hannon & DeFina 2014; King et al., 2004; Suchman & Jordan, 1990). Furthermore, some approaches using Likert‐type scales require the human raters to memorize the relationship between the numbered scale categories and the underlying construct being measured. These cognitive demands can make it more difficult for human raters to use the scale as intended, leading to very low inter‐rater reliability (Hannon & DeFina, 2016). Pairwise comparison methods (Bradley & Terry, 1952; David, 1963; Thurstone, 1927), which elicit binary judgments regarding pairs of items, are less susceptible to these problems. Because these approaches only ask human raters to make a series of binary judgments about which of two paired items has more of the latent attribute of interest, the cognitive demands on the raters are relatively low. Furthermore, there is evidence that approaches using paired comparison data are less likely to suffer from design artefacts due to survey question wording or the labelling of the Likert‐type scale categories (Oishi et al., 2005) and are more accurate when compared to objective, gold standard measures (Phelps et al., 2015). Nonetheless, existing models for paired comparison data are not without their own limitations. Standard models (Bradley & Terry, 1952; Thurstone, 1927) assume that the latent attribute space is unidimensional and that there are no differences in how raters perceive the latent attributes of the paired items. This eliminates the possibility of estimating whether some types of raters perceive the items in question differently than other raters. Understanding such perceptual differences is an important part of recent research in a number of fields (Abrajano et al., 2021; Hannon & DeFina, 2014; Neiss et al., 2009). As discussed in Section 2.1 below, Carlson and Montgomery (2017) include a rater‐specific parameter in their model that allows for differences in perceptual sensitivity across raters. While this approach is an improvement, it still assumes that each rater projects the latent item‐specific attributes onto the same unidimensional space as all other raters. This can be a serious limitation in applications where the perceptual differences across raters are more extreme. Attempts have been made to develop pairwise comparison models with multidimensional latent attribute spaces (Balakrishnan & Chopra, 2012; Carroll & De Soete, 1991; Yu & Chan, 2001). However, the rater‐specific parameters in these approaches are effectively treated as nuisance parameters which again limits what can be learned about rater‐specific perceptual differences. In this article, we address these limitations by developing a new model for pairwise comparisons. The proposed model assumes that each rater's perceptions are a weighted average of the multidimensional latent attributes of the items. This parameterization allows for a straightforward interpretation of the perceptual differences across raters and how those perceptual differences manifest as differences in binary judgments. This is a major advantage over previous multidimensional pairwise comparisons models where interpretation of the rater‐specific parameters is not so straightforward. Moreover, in the second version of the new model, we add a Dirichlet process prior on the rater‐specific parameters. This allows us to flexibly cluster raters into groups that share similar perceptual orientations. This ability to group raters by the similarity of their shared perceptions can advance our understanding of how perceptions vary across raters. In addition, examining the associations between rater characteristics and the rater‐specific perceptual parameters provides us with information on how perceptual differences vary across identifiable groups of individuals. We conduct multiple simulation studies with varying amounts of data to demonstrate how the proposed models and model‐fitting algorithms work in practice. These studies demonstrate that both versions of the new model are able to accurately estimate the parameter values based on simulated binary choice data. Moreover, we observe that estimation accuracy improves as the datasets become larger. We apply the new model to original survey data on the perceived truthfulness of public statements made about COVID‐19. Our analysis sheds important light on how individuals perceive information on COVID‐19 and how their perceptual orientation covaries with their personal characteristics and political beliefs. We hypothesize that perceptions of truthfulness are influenced by two distinct attributes of the statements: (1) the objective truthfulness of the statements, and (2) the political valence of the statements. Applying our new model to our survey data, we find that the objective truthfulness of a statement only weakly correlates with survey respondent perceptions of truthfulness. On the other hand, we find strong correlations between the political valence of the statements and their perceived truthfulness, moderated by the political leaning of the respondent. Statements made by a co‐partisan of a respondent tend to be viewed as more truthful by that respondent. A sizable fraction of respondents gauge the truthfulness of COVID‐19 statements through partisan lenses. For these respondents, partisanship has a stronger impact on their responses than does the actual truthfulness of the statements. Indeed, the responses from the most right‐leaning respondents are negatively correlated with the objective truthfulness of the statements. That said, a plurality of respondents are relatively unswayed by partisanship but have a difficult time accurately gauging the truthfulness of COVID‐19 statements. We also observe associations between the respondent‐specific perceptual parameters and public‐health‐relevant behaviours, such as mask wearing and social distancing. The remainder of this paper proceeds as follow. First, we detail the new multidimensional pairwise comparison model. Next, we summarize the results from simulation studies of the model. Third, we describe the COVID‐19 survey design and data collection procedure. Next, we apply the new model to the pairwise comparison data collected in the survey. We report and compare the results from existing unidimensional models and the newly proposed multidimensional model. We conclude by discussing our findings.

A NEW MODEL FOR PAIRWISE COMPARISONS DATA WITH HETEROGENEOUS PERCEPTIONS

Traditional models for pairwise comparisons assume that objects have unidimensional latent attributes (Bradley & Terry, 1952; David, 1963; Thurstone, 1927). Some researchers add a unidimensional respondent‐specific parameter to account for respondents' different levels of ability or sensitivity (Carlson & Montgomery, 2017). There have also been attempts to generalize pairwise comparison models to multidimensional latent spaces (Balakrishnan & Chopra, 2012; Carroll & De Soete, 1991; Yu & Chan, 2001). In this section, we briefly review the existing pairwise comparison models, before we introduce our new model.

Existing models

We start by reviewing unidimensional pairwise comparison models. Consider a set of J objects . We assume that each has a latent attribute that denotes an attribute of interest. While is unobserved, we do observe , the result of a paired comparison of and by respondent i, in which i is asked to make a ranking judgment as to whether or has a larger value of the latent attribute. is equal to 1, if respondent i judges to have a larger value of the latent attribute in question than , 0 otherwise. More specifically, we assume where F(·) is a cumulative distribution function. If F(·) is the standard normal distribution, the model above is the Thurstone model (Thurstone, 1927). If F(·) is the logistic distribution, the model above is the Bradley–Terry model (Bradley & Terry, 1952). A variant of the above model is to assume that respondents vary in their ability or sensitivity to discern the latent differences between objects, but the latent object attributes remain on the real line, that is, for j = 1, …, J. Here Typically, it is assumed that for i = 1, …, N (Carlson & Montgomery, 2017). In addition to the unidimensional models, researchers have also proposed multidimensional pairwise comparison models (Cattelan, 2012). For the multidimensional models, both objects and respondents are assumed to have locations in a common latent space. Between any two objects, a respondent prefers the object whose latent location is closer to that of her own. The d‐dimensional wandering vector model is a typical multidimensional pairwise comparison model (Carroll & De Soete, 1991), the setup of which is as follows: where and for i = 1, …, N, and . A respondent vector, , is a unit‐length vector with non‐negative elements. No estimation method was provided for this model when it was originally proposed. Later, MCMC methods have been proposed for the wandering vector model (Balakrishnan & Chopra, 2012; Yu & Chan, 2001). However, these approaches do not constrain a respondent vector as a unit‐length non‐negative vector. Instead, they assume a multivariate normal prior for all , such as . These weaker constraints on respondent attributes lessen the interpretability of the model, since the dot product of a respondent vector and an object vector is no longer the weighted average of an object's attributes on different dimensions.

A new multi‐dimensional model

The unidimensional pairwise comparison models discussed above have important limitations. They either assume no perceptual differences between respondents, or they assume that respondents only vary in the ability or sensitivity to discern the object attribute differences. Moreover, the unidimensional attribute assumption is overly strong when respondents evaluate objects on more than one latent dimension. Furthermore, respondents may differentially weight the attributes that correspond to different latent dimensions. Existing multidimensional pairwise comparison models are difficult to interpret due to their lack of constraints on the respondent‐specific parameters. When respondent‐specific parameters are not constrained to be unit‐length non‐negative vectors, these parameters cannot be easily viewed as dimension‐specific weights. In addition, the existing models do not allow for any clustering among the respondent‐specific parameters that would represent shared perceptual frameworks among respondents. To address these issues, we propose a new multidimensional pairwise comparison model. We detail two versions of this model—each corresponding to a different prior distribution for the respondent‐specific parameters. In this new model, we operationalize a unit‐length weight vector for each respondent with trigonometric functions. This allows us to model a respondent's perception of an object as the weighted average of the object's attributes on each latent dimension. The model therefore allows researchers to estimate how multiple latent sub‐attributes are aggregated into a general latent attribute, and to assess the extent to which respondents differ in their construction of the general attribute from the sub‐attributes. In the first version of the model we assume a uniform prior for these respondent‐specific parameters. In the second version, we assume a Dirichlet process prior on the respondent‐specific parameters. This second model allows researchers to learn how perceptual frameworks cluster among respondents and how various respondent characteristics relate to respondent perceptions of the latent attributes in interest. We begin with the special case of a two‐dimensional latent attribute space. Once again, consider a set of J objects . However, we now assume each has latent attributes that can be represented by a location in two‐dimensional Euclidean space: . We assume that respondents differ in the weights they place on each of these two dimensions. More specifically, respondent i's judgment depends on a unit‐vector with in the following way: where is the CDF of a univariate standard normal distribution, and · denotes the dot product between two vectors. Intuitively, respondent i projects the latent attributes onto and then uses the signed distance between the projected points to compare two objects. This is depicted graphically in Figure 1.

FIGURE 1

Example of the two‐dimensional latent attribute model. In the left panel, respondent 1 places much more emphasis on dimension 1 (the horizontal dimension). As a result, this individual is slightly more likely to evaluate as being preferred to . In the right panel, respondent 2 gives weight to both of the latent dimensions with slightly more weight placed on dimension 2 (the vertical dimension). As a result, this individual is much more likely to evaluate as being preferred to

Uniform prior

The most intuitive prior on is the uniform prior: This specification has the advantage of simplicity but it does not allow for the possibility that is equal to for any two individuals i and . Such grouping may be desirable if we are interested in making inferences about the extent to which respondents share the same perceptual framework for evaluating the latent attributes in question. Furthermore, allowing to equal with positive probability is also useful in situations where respondents only rate a small‐to‐moderate number of paired comparisons. In these situations, allowing some form of clustering among the γ parameters will lower the variance of the resulting estimates of the γ parameters.

Dirichlet process prior

An alternative is to assume that each is drawn from a distribution G that is itself drawn from a Dirichlet process. More formally, where is a concentration parameter and is the centring distribution, which is specified as . α could be either fixed at a constant value or given a prior distribution and estimated. If α is to be estimated, then we assume a Gamma prior distribution for α: While the Dirichlet process prior for γ complicates estimation, it has the advantage of allowing for the possibility of perceptual clustering among respondents. The new models can be generalized to d > 2 dimensions by assuming that each , and the perceptual unit‐vectors are constrained to lie in the positive orthant of . would be (d − 1)‐dimensional in this case with each element being an angle in the positive orthant. For example, if d = 3, then we can use to represent a unit‐length vector in the positive orthant, . The MCMC algorithms we use for fitting these two versions of the model are discussed in Appendix A.

SIMULATION STUDY

We conduct simulation studies to illustrate how the samplers for the two versions of the new model work. The experiments show that both versions of the new model are able to recover the true latent variable values from the observed binary choice data. For both versions of the new model, we specify four configurations of respondent number, I, and object number, J: (I = 40, J = 40), (I = 40, J = 80), (I = 80, J = 40), (I = 80, J = 80). For each configuration, we repeat the simulation steps 50 times, resulting in 50 simulated data sets for each configuration. We use two measures to gauge how well the model can uncover the true latent variables values: the correlations between the estimated parameters and the true values, and the mean squared error (MSE) of the estimated parameters. We compute the correlations and MSEs for the results from each simulation data set under each simulation configuration. The results of the simulation studies of the first model with the uniform prior on can be summarized as follows. In the simulations with the least information (I = 40 and J = 40), the modal correlation between the estimated γ's and their true values is approximately 0.85 and the mode of the MSEs is approximately 0.09. As J increases from 40 to 80, the modal correlation between the estimated γ's and their true values increases to approximately 0.95 and the modal MSE value for these parameters drops to about 0.03. Increasing the number of objects to be rated (J) does more to increase the precision of the estimated than increasing the number of raters (I). In the simulations with I = 40 and J = 40, the modal correlations between the estimated θ's and their true values are around 0.9, and the mode of the MSEs is around 0.25. As I increases to 80 and J increases to 80, the modal correlation between the estimated θ's and their true values increases to approximately 0.97 and the modal MSE value for these parameters drops to about 0.07. The simulations suggest that accurate estimation of depends, to a greater extent, on both the number of raters (I) and the number of objects being rated (J) than does accurate estimation of which is more heavily dependent on J. The summary for the simulation study on the second version of the new model is as follows. Under the simulation configuration with I = 40 and J = 40, the modal correlation between the estimated γ values and their true values is around 0.88 and the mode of the MSEs is approximately 0.07. As J increases from 40 to 80, the modal correlation between the estimated γ's and their true values increases to approximately 0.99 and the modal MSE value for these parameters drops to about 0.01. Once again, we observe that increasing the number of objects to be rated (J) has a greater impact on the precision of the estimates than increasing the number of raters (I). In the simulations with I = 40 and J = 40, the modal correlations between the estimated θ's and their true values are around 0.9, and the mode of the MSEs is around 0.25. As I increases to 80 and J increases to 80, the modal correlation between the estimated θ's and their true values increases to approximately 0.97 and the modal MSE value for these parameters drops to about 0.05. We again observe that accurate estimation of depends on both I and J to a greater extent than does accurate estimation of . In sum, across all the simulation repetitions, we consistently observe high correlations between estimated parameters and the true values, and the MSEs of the estimated parameters are low relative to the standard deviation. Moreover, as I and J increase, the samplers for both versions of the new model perform better at latent variable estimation. The detailed simulation study results are reported in the supplemental information document.

APPLICATION: THE PERCEIVED TRUTHFULNESS OF PUBLIC COVID‐19 STATEMENTS

In this section we apply our new model to the substantive application of how survey respondents perceive the truthfulness of public statements about COVID‐19. Replication data and code are archived at the Harvard Dataverse, https://doi.org/10.7910/DVN/KBAJJO.

COVID‐19 statements

Since we are interested in the extent to which members of the mass public accurately assess the truthfulness of statements about COVID‐19, it is important that we use fact‐checked statements so as to have an independent measure of the truthfulness of each statement. Our source of these fact‐checked COVID‐19 statements is the website https://www.politifact.com. PolitiFact's Editor‐in‐Chief, Angie Drobnic Holan, gave us permission to use the PolitiFact data for this survey in an email on 11 May 2020. PolitiFact catalogues a range of statements that have political content. According to PolitiFact's own website: ‘Each day, PolitiFact journalists look for statements to fact check. We read transcripts, speeches, news stories, press releases and campaign brochures. We watch TV and scan social media. Readers send us suggestions via email to ; we often fact‐check statements submitted by readers. Because we cannot feasibly check all claims, we select the most newsworthy and significant ones (Holan, 2020)’. PolitiFact journalists fact check these statements and categorize the truthfulness of each statement into one of six categories (from most truthful to least truthful): true, mostly true, half true, mostly false, false, pants on fire (Holan, 2020). We selected 42 statements with the intent of balancing the truthfulness of the statements and the slant of the statements (left, neutral and right). These statements were made between 22 February 2020 and 8 May 2020. Ideally, we would have used equal numbers of left‐, neutral‐ and right‐leaning statements from all six truthfulness categories. However, some categories were sparsely populated and we were forced to dichotomize the truthfulness categories into high truth (true, mostly true and half true) and low truth (pants on fire, false and mostly false). This gave us seven statements in each of the 3 × 2 combinations of slant × truthfulness. The full set of 42 statements along with their truthfulness ratings and slant is presented in the supplemental information document.

The survey

The survey was conducted on 8 July 2020. Respondents were recruited from the Lucid Marketplace (https://luc.id/marketplace/). Quotas were used to make the sample approximate the US voting age population. The survey was conducted online using the Qualtrics interface. In this survey, respondents were asked to report their view of the relative truthfulness of COVID‐19 statements given to them in randomly selected pairs of statements. Figure 2 depicts what this looks like for one randomly selected pair of statements. After the paired comparisons of COVID‐19 statements were given to respondents, the respondents were asked a sequence of demographic, attitudinal and behavioural questions.

FIGURE 2

Screen shot of COVID‐19 statement comparison survey

Screen shot of COVID‐19 statement comparison survey We removed 187 respondents that Lucid flagged as having a high likelihood of being fraudulent. We received usable responses from 2,621 respondents. On average, each respondent gave us their view of the relative truthfulness of just less than 15 pairs of randomly selected statements. We provide more information about our survey sample in Appendix B. In addition, the Supplemental Information provides descriptive statistics on our respondent sample.

Results

Before presenting results from our new model, we present results from simple unidimensional models. By comparing and contrasting the results from the simple unidimensional models and the new model, we show that the new model leads to more interpretable and insightful findings.

Results from unidimensional models

As a starting point, we fit the simple Thurstone model to the pairwise comparisons data from our survey. Here Φ(·) is the CDF of a standard normal distribution. To identify the model, we constrained to be negative (statement 1014 is a neutral‐valence, low‐truth statement) and constrained (statement 1015 is a neutral‐valence, high‐truth statement). These constraints are consistent with an interpretation of the latent dimension as objective truthfulness. The remaining θ parameters were assumed to have independent standard normal prior distributions. The MCMC sampler was run for 120,000 iterations with the first 20,000 discarded as burn‐in iterations. Every 10th iteration was stored. Inspection of the output reveals that this simple model provides a poor fit to the observed data. For instance, for each observed we calculate the in‐sample posterior expectation of a correct classification: where m = 1, …, M indexes the MCMC draws. Note that a ‘correct’ classification is simply defined to be a classification equal to the observed response—it is not necessarily related to whether respondent i accurately perceived the true truthfulness of statement j relative to statement . The average of these posterior expectations of a correct response, taken over all observed s, is 0.52. We can also aggregate to the statement by averaging over respondents. Doing this, we see that the average probability of a correct classification across all statements is also 0.52, and that no statement has a probability of being correctly classified greater than 0.56. If we aggregate to the respondent by averaging over the statement pairs seen by each respondent, we see that the average probability of a correctly classified response by a respondent is also 0.52. Furthermore, we find that 26% of respondents have probabilities of a correctly classified statement less than 0.5 and only 0.3% of respondents (8 of 2,621) have probabilities of a correctly classified statement greater than 0.6. We also examine how the posterior means of the θ parameters correlate with the objective truth and partisan valence of the statements. To do this we give a ‘pants‐on‐fire’ statement a value of 0, a ‘false’ statement a value of 1, a ‘mostly false’ statement a value of 2, a ‘half‐true’ statement a value of 3, a ‘mostly true’ statement a value of 4 and a ‘true’ statement a value of 5. We then calculated the Spearman rank correlation between these truthfulness ratings and the posterior means of the θ parameters. This produced a rank correlation of 0.42. Similarly, we gave right‐valence statements a value of 1, neutral‐valence statements a value of 0, and left‐valence statements a value of −1. Then we calculated the Spearman rank correlation between the partisan valence of the statements and the posterior means of the θ parameters. This resulted in a rank correlation of −0.17. The simple unidimensional Thurstone model produces estimates of the statement‐specific parameters that are only weakly correlated with objective truth and even more weakly correlated with the other factor that we expect to structure responses—the political valence of the statements. As noted above, a natural extension of the basic Thurstone model is to introduce a respondent‐specific parameter that allows for differential ability to perceive differences between statements. This produces the model: We fit this model to the pairwise comparisons data from our survey. To identify the model, we again constrained to be negative and constrained . Again, these constraints are consistent with the latent dimension being interpreted as objective truthfulness. The remaining θ parameters were assumed to have independent standard normal prior distributions. The β parameters were also assumed to have independent standard normal priors. The sign of the β parameters was not restricted. The MCMC sampler was run for 120,000 iterations with the first 20,000 discarded as burn‐in iterations. Every 10th iteration was stored. If we calculate the in‐sample posterior expectation of a correct classification for this model in the analogous way that we did for the simple Thurstone model, we find that the average of these posterior expectations of a correct response, taken over all observed s, is 0.55. While the inclusion of the respondent‐specific β parameters ensures that the respondent‐level predictions match the observed data at least 50% of the time, it is still the case that 35% of the observed s have posterior probabilities of a correct classification less than 0.50. At the statement level, we see that, on average, statements are classified correctly 55% of the time with only 2 of 42 statements having a probability of correct classification greater than 0.6. The Spearman rank correlation between the posterior means of the statement‐specific θ parameters and the objective truthfulness of the statements is 0.32, which is lower than in the simple Thurstone model. However, the rank order correlation between the posterior means of θ and the partisan valence of the statements is ‐0.73. Even though we constrained the model so that a neutral‐valence, high‐truth statement was to the right of 0 and a neutral‐valence, low‐truth statement was to the left of 0, the resulting estimates of θ are more strongly correlated with the partisan valence of the statements than the objective truthfulness of the statements. Indeed, this warping of truthfulness at the respondent level can be seen in the posterior means of the respondent‐specific β parameters. 38% of respondents have a β parameter with a posterior mean less than 0. In other words, 38% of respondents are, on average, viewing objective truth as subjective falsity, and vice versa.

Results from the two‐dimensional Dirichlet process model

The results from the simple unidimensional models are not fully satisfying. The Thurstone model does a poor job of representing observed patterns in the data, and produces estimates of statement‐specific parameters that only weakly correlate with objective truthfulness. The inclusion of a respondent‐specific parameter slightly improves model fit at the expense of weakening the already weak correlation between the statement‐specific parameter estimates and objective truth. We fit the two‐dimensional Dirichlet process model discussed in Section 2.2.2 to the data in the hope that this provides a better fit than the unidimensional models. To identify the model, we constrained to be equal to 0.25 on the first dimension and greater than 0 on the second dimension (statement 1015 is a neutral‐valence, high‐truth statement); we constrained to be less than 0 on the first dimension (statement 1004 is a right‐valence, low‐truth statement); and we constrained to be greater than 0 on the first dimension (statement 1042 is a left‐valence, high‐truth statement). The remaining parameters were assumed to have independent bivariate normal prior distributions with mean 0 and variance–covariance matrices equal to identity matrices. was set to and the concentration parameter α was assumed to follow a distribution. The MCMC sampler was run for 440,000 iterations with the first 40,000 discarded as burn‐in iterations. Every 40th iteration was stored. Calculating the in‐sample posterior expectation of a correct classification for this model in the analogous way that we did for the unidimensional models above, we find that the average of these posterior expectations of a correct response, taken over all observed s, is 0.57. The respondent‐level predictions and statement‐level predictions also match the observed data 57% of the time. These numbers are slightly better than the values of 0.52 and 0.55 we achieved with the unidimensional models. However, this slight improvement in in‐sample predictive accuracy is not the main advantage of the two‐dimensional Dirichlet process model. The main advantage is that it allows us to uncover a more nuanced understanding of how certain types of respondents assess the truthfulness of statements. More specifically, in this application it allows us to see how some respondents make assessments of truthfulness based on their partisanship or political ideology, while other respondents seem to be more guided by the objective truthfulness of the statements. As a starting point, consider Figure 3. This figure plots the posterior means of for the j = 1, …, 42 statements along with g(0.18), g(0.77), and g(1.41), where 0.18, 0.77, and 1.41 are the minimum, median, and maximum values of the posterior means of for i = 1, …, N. Figure 3 allows us to see how three types of respondents (those with and 1.41) perceive the truthfulness of the statements.

FIGURE 3

Posterior Means of and the minimum, maximum, and median posterior means of g( ). In each panel, the points correspond to the posterior means of the parameters for the 42 statements. The arrows correspond to the g( ) vectors at the minimum posterior mean of , the maximum posterior mean of , and the median posterior mean of for i = 1, …, N. In panel (a), the points are shaded based on the objective truthfulness of the statements. In panel (b), the points are colour‐coded based on the left–right valence of the statements. Finally, in panel (c), the points are coded according to both the objective truthfulness of the statements and the left–right valence of the statements. Note that projecting the points onto g(0.77) (0.77 is the median posterior mean of ) produces values associated with the objective truthfulness of the statements, albeit weakly. On the other hand, projecting the points onto g(0.18) and g(1.41) results in points where higher values correspond to more left‐leaning and right‐leaning valence respectively Respondents with parameters near the median of 0.77 project the statement‐specific parameters onto a dimension that correlates with the objective truthfulness of the statements, albeit weakly. On the other hand, respondents with parameters near the minimum of 0.18 project the statement‐specific parameters onto a dimension which is positively correlated with the leftward valence of the statements. Finally, respondents with parameters near the maximum of 1.41 project the statement‐specific parameters onto a dimension which is positively correlated with the rightward valence of the statements. While the information in Figure 3 is useful, it does not provide information on three important things: (a) the distribution of respondent‐specific parameters, (b) the precise strength of the correlation between particular projections and the objective truthfulness of statements and (c) the precise strength of the correlation between particular projections and the left–right valence of the statements. This information is displayed in Figure 4.

FIGURE 4

Histogram of the posterior means of for i = 1, …, N along with the spearman rank correlations between and objective truthfulness and left–right valence for various values of γ. Note that respondents whose posterior mean γ parameter is near 0.77 tend to assess statements primarily based on the objective truthfulness of the statements but that this association is weak (correlation slightly greater than 0.4). Respondents with γ parameters that are closer to the extremes of 0.18 and 1.41 assess the truthfulness of the COVID‐19 statements in ways that are strongly associated with the left–right valence of the statements. Further, respondents with γ parameters greater than approximately 1.1 not only assess the truthfulness of the COVID‐19 statements such that right‐valence statements are perceived as more truthful, they also assess the truthfulness of COVID‐19 statements in ways that are negatively correlated with the objective truthfulness of the statements. All correlations were calculated across all J = 42 statements Looking at Figure 4, three things are apparent. First, most respondents have estimated γ parameters near the median posterior mean value of 0.77 (the modal estimate of is just to the right of 0.77). There are a smaller number of respondents who have lower estimated γ parameters—about 16% of the respondents have estimated γ parameters less than 0.5—and there are an even smaller number of respondents with much larger estimated γ parameters—about 13% have estimated γ parameters greater than 1.0. Second, Figure 4 presents the correlation between and the objective truthfulness of the statements for various values of γ, for j = 1, …, 42. This is the multidimensional analogue to the correlation between and objective truthfulness from the unidimensional models discussed in Section 4.3.1. Once again, we give a ‘pants‐on‐fire’ statement a value of 0, a ‘false’ statement a value of 1, a ‘mostly false’ statement a value of 2, a ‘half‐true’ statement a value of 3, a ‘mostly true’ statement a value of 4 and a ‘true’ statement a value of 5. We then calculated the Spearman rank correlation between these truthfulness ratings and for the posterior means of for the 42 statements and 15 equally spaced values of gamma from 0.18 to 1.41. This produces the 15 colour‐coded correlations at the top of Figure 4. What we see here is that the γ value that induces the highest correlation with the objective truth of the statements is γ = 0.70 which gives rise to a correlation of 0.45. γ values less than or equal to 0.88 give rise to correlations with objective truth that are greater than or equal to 0.28. However, individuals with γ values greater than or equal to 1.14 tend to rate COVID‐19 statements in ways that are negatively correlated with the objective truth of the statements. Third, Figure 4 presents the correlation between and the left–right valence of the statements. As above, we gave right‐valence statements a value of 1, neutral‐valence statements a value of 0, and left‐valence statements a value of −1. We calculated the Spearman rank correlation between these left–right valence ratings and for the posterior means of for the 42 statements and 15 equally spaced values of gamma from 0.18 to 1.41. This produces the 15 colour‐coded correlations at the very top of Figure 4. The resulting pattern of correlations with the left–right valence is stark. Respondents with the highest values of γ, say above or equal to 1.14, perceive the truthfulness of the COVID‐19 statements in a way that positively correlates with the rightward valence of the statements with correlations of 0.77 or above. These same individuals' evaluations of the truthfulness of the COVID‐19 statements are negatively correlated with objective truth. On the other hand, individuals with the lowest values of γ, say below or equal to 0.35, perceive the truthfulness of the COVID‐19 statements in a way that negatively correlates with the rightward valence of the statements with correlations of −0.70 or below. These individuals' evaluations of the truthfulness of the COVID‐19 statements are weakly positively correlated with the objective truth of the statements. To summarize, respondents with the modal value of γ are primarily responding to the objective truthfulness of the COVID‐19 statements when evaluating the truthfulness of pairs of statements. These respondents are not rating statements in ways that are correlated with the left–right valence of the statements. That said, there is only a weak correlation between the objective truth of the statements and their subjective perceptions. On the other hand, respondents with γ values at the two extremes are rating the truthfulness of statements in ways that are strongly associated with left–right valence of the statements—respondents with low values of γ tend to see left‐leaning statements as more truthful while respondents with high values of γ tend to see right‐leaning statements as more truthful. The objective truth of the statements is less relevant for these respondents than is the left–right valence of the statements. Indeed, those respondents who tend to see right‐leaning statements as more truthful tend to perceive the truthfulness of the statements in ways that are slightly negatively correlated with the objective truth of the statements. We also examine how respondent perceptions of COVID‐19 statement truthfulness, as measured by their estimated γ parameters, correlate with respondent characteristics and behaviours. Figure 5 plots the relationship between the respondent‐specific γ estimates and three measures related to the political attitudes of respondents: partisanship (as operationalized by an indicator of whether a respondent self‐identifies as a strong Republican), ideology (as operationalized by respondent self‐placement on a 7‐point Likert‐type scale running from 1 = ‘very liberal’ to 7 = ‘very conservative’) and the slant of news media consumption (as operationalized by respondent self‐statement of their preferred news outlet combined with the media bias ratings from https://www.allsides.com/media‐bias/media‐bias‐ratings).

FIGURE 5

Associations between posterior means of for i = 1, …, N and respondent partisanship, ideology, and slant of news media consumption. The dark orange lines are the posterior means of local regression predictions averaged over the posterior distribution of . The light orange band is the pointwise central 95% credible region for these local regression predictions, again averaged over the posterior distribution of . Each panel of this figure plots a local regression estimate of the conditional expectation function of the variable in question on for respondents i = 1, …, N. Each panel was constructed by fitting M local regressions of the variable in question on each of the M posterior samples of . The pointwise average of these M estimated regression functions is the dark orange line in each panel. The light orange band in each panel is the pointwise central 95% credible region for these local regressions (the empirical 2.5th and 97.5th pointwise percentiles of the M estimated regression functions) Not surprisingly, inspection of Figure 5 reveals that right‐wing partisanship, ideology and news media consumption is increasing in γ. Respondents with the largest values of γ tend to be the respondents with the most right‐leaning political views. Those with the lowest γ values tend to be the most left‐leaning respondents. We also examine whether respondent‐specific γ values (and thus the perceptual framework that respondents use to evaluate the truthfulness of COVID‐19 statements) are associated with behaviours important for public health. More specifically, Figure 6 plots the relationship between the respondent‐specific γ estimates and (a) a measure of a lack of social distancing (operationalized as 0/1 indicator equal to 1 if a respondent said that 21 or more people were 6 feet or closer to them in the past week), and (b) a measure of mask wearing (operationalized as the number of situations, out of nine possible, where the respondents said they wear a mask). The panels are constructed in the same way as Figure 5.

FIGURE 6

Associations between posterior means of for i = 1, …, N and self‐reported social‐distancing behaviour and mask‐wearing behaviour. The dark orange lines are the posterior means of local regression predictions averaged over the posterior distribution of . The light orange band is the pointwise central 95% credible region for these local regression predictions, again averaged over the posterior distribution of Figure 6 shows that the structure underlying how respondents judge the truthfulness of COVID‐19 statements (as measured by their γ values) is associated with behaviours that have consequences for public health. Specifically, lack of social distancing is increasing in γ, while mask wearing is decreasing in γ.

DISCUSSION

In this paper, we have proposed a new pairwise comparison model to measure multidimensional latent attributes and respondent‐specific perceptual parameters. This model incorporates interpretable constraints on respondent‐specific parameters, and explicitly models how object attributes on different dimensions are aggregated into a respondent's choices. The new model improves upon previous models in that it provides an easily interpretable framework for characterizing and estimating respondent‐specific differences in perceptions along with multidimensional latent attributes of objects. We fit the model using MCMC methods. Software for fitting the model is freely available in the MCMCpack R package (Martin et al., 2011). To illustrate the strength of the new model, we have applied the new model to both simulated data and original survey data. The simulation studies show that the new model is able to recover the true values of latent variables based on observed binary choice data. In the survey data application, our analysis sheds light on how statements on COVID‐19 are perceived by respondents and what respondent characteristics are associated with the perceptual frameworks used by respondents. Importantly, we find a weak correlation between the actual truthfulness of a statement and respondents perceptions of truthfulness. More importantly, we find that the political valence of statements is largely responsible for the variation in perceived truthfulness. Co‐partisanship between a respondent and the speaker of a statement predicts higher perceived truthfulness. The respondent‐specific parameters estimated in the new model also bear out the general patterns shown in the respondents' perceptions, and the associations between perceptions and behaviours. Our findings show that individuals generally have a hard time differentiating truthful information on COVID‐19 from false information. Moreover, many respondents rely on partisanship as a cue to gauge the truthfulness of information on COVID‐19. Among these partisan respondents, the most rightward‐leaning respondents' tend to view objectively truthful statements as subjectively false. Finally, we also observe associations between the respondent‐specific perceptual parameters and respondents' practice of mask wearing or social distancing. It is important to note that there are limitations to our work. As with many other latent attribute models, proper use of our new models requires subject matter expertise at a number of points. The models are only identifiable after constraints are placed on the model parameters. While the constraints are, to some degree, arbitrary; some choices will result in more easily interpretable results than others. Subject matter expertise should thus inform these decisions. Relatedly, in applications that focus on the rater‐specific parameters, the objects selected for rating determine the estimand and thus affect the results. These decisions should be informed by domain‐specific knowledge. There are also limitations to our COVID application. The conclusions we reach are based on the sample of respondents from July 2020 and the statements they were given to evaluate were from the early days of the pandemic. We are thus only able to make inferences about public perceptions in this time period. We were also limited in the number of statements that we could use. As we saw in our simulation studies, we would have increased the precision of our estimates if we were able to use more statements. Finally, we only looked at public perceptions of the truthfulness of statements about COVID‐19. Accordingly, we are not able to say anything about how these public perceptions are similar to or different from perceptions of statements in other policy areas such as economic policy. More work is required to fully realize the potential of our proposed models. First, additional work is warranted on the question of how to most efficiently allocate pairs of items to the raters. We suspect that an active learning approach may be dramatically more efficient than the simple randomization scheme that we used in our survey. Second, while it is clear how to extend our model to latent spaces with three or more dimensions, efficiently fitting such extended models may require modifications to our MCMC algorithm. Finally, we think there is room for more work on model evaluation within this class of models. Click here for additional data file.

5 in total

1. Pairwise comparison versus Likert scale for biomedical image assessment.

Authors: Andrew S Phelps; David M Naeger; Jesse L Courtier; Jack W Lambert; Peter A Marcovici; Javier E Villanueva-Meyer; John D MacKenzie
Journal: AJR Am J Roentgenol Date: 2015-01 Impact factor: 3.959

2. Looking deathworthy: perceived stereotypicality of Black defendants predicts capital-sentencing outcomes.

Authors: Jennifer L Eberhardt; Paul G Davies; Valerie J Purdie-Vaughns; Sheri Lynn Johnson
Journal: Psychol Sci Date: 2006-05

3. Reliability Concerns in Measuring Respondent Skin Tone by Interviewer Observation.

Authors: Lance Hannon; Robert DeFina
Journal: Public Opin Q Date: 2016-04-15

4. A multidimensional pairwise comparison model for heterogeneous perceptions with an application to modelling the perceived truthfulness of public statements on COVID-19.

Authors: Qiushi Yu; Kevin M Quinn
Journal: J R Stat Soc Ser A Stat Soc Date: 2022-03-21 Impact factor: 2.175

5. Age differences in perception and awareness of emotion.

Authors: Michelle B Neiss; Lindsey A Leigland; Nichole E Carlson; Jeri S Janowsky
Journal: Neurobiol Aging Date: 2007-12-21 Impact factor: 4.673

5 in total

1 in total

1. A multidimensional pairwise comparison model for heterogeneous perceptions with an application to modelling the perceived truthfulness of public statements on COVID-19.

Authors: Qiushi Yu; Kevin M Quinn
Journal: J R Stat Soc Ser A Stat Soc Date: 2022-03-21 Impact factor: 2.175

1 in total