Jolien Cremers1, Kees Tim Mulder1, Irene Klugkist1,2. 1. Department of Methodology and Statistics, Utrecht University, The Netherlands. 2. Research Methodology, Measurement and Data Analysis, Behavioural Sciences, University of Twente, Enschede, The Netherlands.
Abstract
The interpretation of the effect of predictors in projected normal regression models is not straight-forward. The main aim of this paper is to make this interpretation easier such that these models can be employed more readily by social scientific researchers. We introduce three new measures: the slope at the inflection point (bc ), average slope (AS) and slope at mean (SAM) that help us assess the marginal effect of a predictor in a Bayesian projected normal regression model. The SAM or AS are preferably used in situations where the data for a specific predictor do not lie close to the inflection point of a circular regression curve. In this case bc is an unstable and extrapolated effect. In addition, we outline how the projected normal regression model allows us to distinguish between an effect on the mean and spread of a circular outcome variable. We call these types of effects location and accuracy effects, respectively. The performance of the three new measures and of the methods to distinguish between location and accuracy effects is investigated in a simulation study. We conclude that the new measures and methods to distinguish between accuracy and location effects work well in situations with a clear location effect. In situations where the location effect is not clearly distinguishable from an accuracy effect not all measures work equally well and we recommend the use of the SAM.
The interpretation of the effect of predictors in projected normal regression models is not straight-forward. The main aim of this paper is to make this interpretation easier such that these models can be employed more readily by social scientific researchers. We introduce three new measures: the slope at the inflection point (bc ), average slope (AS) and slope at mean (SAM) that help us assess the marginal effect of a predictor in a Bayesian projected normal regression model. The SAM or AS are preferably used in situations where the data for a specific predictor do not lie close to the inflection point of a circular regression curve. In this case bc is an unstable and extrapolated effect. In addition, we outline how the projected normal regression model allows us to distinguish between an effect on the mean and spread of a circular outcome variable. We call these types of effects location and accuracy effects, respectively. The performance of the three new measures and of the methods to distinguish between location and accuracy effects is investigated in a simulation study. We conclude that the new measures and methods to distinguish between accuracy and location effects work well in situations with a clear location effect. In situations where the location effect is not clearly distinguishable from an accuracy effect not all measures work equally well and we recommend the use of the SAM.
Circular models are models for data with a circular outcome variable. A circular variable measures a direction in two‐dimensional space in degrees or radians and requires analysis methods that are different from standard methods for linear data. In the field of psychology circular data can be found in research on the visual perception of space (Matsushima, Vaz, Cazuza, & Ribeiro Filho, 2014), moving room experiments (Stoffregen, Bardy, Merhi, & Oullier, 2004), visual working memory experiments (Heyes, Zokaei, & Husain, 2016), movement synchronization (Kirschner & Tomasello, 2009; Ouwehand & Peper, 2015) and cognitive maps (Brunyé, Burte, Houck, & Taylor, 2015). Measurements on the interpersonal circumplex can also be regarded as circular data (König, Onnen, Karl, Rosner, & Butollo, 2016; Santos, Vandenberghe, & Tavares, 2015; Wright, Pincus, Conroy, & Hilsenroth, 2009; Zilcha‐Mano et al., 2015). In general, circular variables measure directions, are circumplex scales, or are a measure of periodic (weekly, daily, hourly, etc.) patterns. Wright et al. (2009) outline how circular data are used in research using circumplex measures. They introduce how to compute a circular mean and test for differences between groups. However, circular models could be used much more effectively in psychological science. Take, for instance, a study by Locke, Sayegh, Weber, and Turecki (2017) on interpersonal characteristics of depressed outpatients. Circumplex profiles of patients were made and compared with those of normative samples. Although the authors do use circular means and variances and do compare groups, a more elaborate circular model would allow for the simultaneous evaluation of multiple predictors. It would then be possible to assess whether there still is a difference in circumplex profile between depressed and normative samples when also accounting for other variables such as gender, age or a measure of depression severity.Analysing circular data is not straightforward and special methods are required. One of the approaches to circular data is the so‐called embedding approach. In this approach projected distributions are used. Projected distributions are bivariate distributions defined in and projected onto the circle. Other approaches to modelling circular data are the wrapping and intrinsic approaches (Mardia & Jupp, 2000). These approaches are based on respectively wrapping distributions defined in onto the circle and using distributions defined on the circle itself, such as the von Mises distribution. A paper by Rivest, Duchesne, Nicosia, and Fortin (2016) provides an overview of some of the circular regression models in the literature. Although various types of models for the three approaches have been described, we will only consider regression models of the embedding approach in this paper.Nuñez‐Antonio, Gutiérrez‐Peña, and Escarela (2011) and Wang and Gelfand (2013) have developed Bayesian methods for regression based on the projected normal and general projected normal distributions. These are both based on Presnell, Morrison, and Littell (1998) who first used a projected normal distribution to analyse circular regression models. In the embedding approach it is relatively easy to fit more complex models because the distributions that are used are based on distributions in . Indeed, in the literature more complex regression type models have been introduced, including random effects models (Nuñez‐Antonio & Gutiérrez‐Peña, 2014) and spatial and spatio‐temporal models (Mastrantonio, Lasinio, & Gelfand, 2016; Wang & Gelfand, 2014). However, we will limit ourselves to the simpler multiple regression model.Although the embedding approach is flexible with regard to model fitting, the interpretation of the effects of predictors in these models is not easy. According to Maruotti (2016) this is the major drawback of models based on projected distributions. Currently, when using the embedding approach we obtain two regression coefficients for each variable in the model. This is a result of using an underlying bivariate distribution. The two coefficients are, however, not interpretable as an effect on the circle. Additionally, they do not allow us to directly distinguish between an effect on the mean and an effect on the spread of the circular outcome. In the example of the study by Locke et al. (2017) we would not be able to distinguish between the differences between depressed and normative samples in average profiles and differences in the within‐group homogeneity of the profiles. There is no literature yet dealing with this problem. We will therefore introduce new interpretation tools that combine the bivariate coefficients into one circular coefficient. The new tools allow us to assess whether there is an effect of a predictor on the mean of the circular outcome and how large this effect is.In Section 2 we describe the embedding approach. We distinguish between effects on the circular mean and spread in Section 3. Next, in Section 4, we combine two bivariate coefficients into one circular coefficient and introduce new tools for interpretation. These tools are then applied to the example data set in Section 5. Lastly, we perform a simulation study to assess the new tools in Section 6. The paper concludes with a discussion in Section 7.
Embedding approach
In this section we introduce the embedding approach and projected normal distribution. Both have been introduced and described previously (Nuñez‐Antonio & Gutiérrez‐Peña, 2005; Presnell et al., 1998). Therefore, a large part of this section focuses on the interpretation of the estimates from a projected normal regression model.
Projected normal distribution
There are several representations in which one can specify circular data: angles, polar coordinates or unit vectors in . In this paper we will refer to an observation of a circular outcome variable in angles as θ, where i = 1, … , n, and to its unit vector representation as u
. We assume that the outcome variable can be represented by an unobserved column vector y
in as follows: where r
is the length of the vector y
. The angular outcome variable originates from a projection onto the circle of a vector in . If we assume that the underlying y
originate from a bivariate normal distribution with mean μ and variance–covariance matrix I, it follows from equation (2) that θ has a projected normal (PN) distribution with density function where θ is the circular outcome variable measured in radians (), is the mean vector of the distribution, the variance–covariance matrix I is an identity matrix, and . The terms and denote the cumulative distribution function and the probability density function of the standard normal distribution, respectively. An identity variance–covariance matrix is chosen to identify the model. Due to this configuration the PN distribution is always unimodal and symmetric. Another configuration can be found in Wang and Gelfand (2013) who use different constraints on the matrix resulting in the general PN distribution that can also take a skewed and multimodal shape.To fit a model on the mean of the projected normal distribution μ we need r
to obtain the unobserved bivariate normal vectors y
. The estimation of r
is a missing‐data problem that is solved by treating the unobserved lengths r
as latent or auxiliary variables in the model. We can then use existing techniques such as the EM algorithm (Presnell et al., 1998) or Bayesian methods (Nuñez‐Antonio & Gutiérrez‐Peña, 2005), to obtain inference on the y
.
Regression
In regression we have independent observations of a vector of linear predictors x
for each individual i = 1, … , n. The model for one of the bivariate normal vectors y
has mean structure , where and each β is a vector with intercept and regression coefficients. The first component of x
equals 1 so as to be able to estimate an intercept. Formally, this notation is only correct when the predictors in x
are equal for both components of μ
. The model then has the same structure as a multivariate regression model. The dimensions of and are, however, allowed to differ.Even though our main interest lies in effects on the circular mean, the mean structure of y
can influence both the mean and spread of a circular outcome. Henceforth we call an effect on the mean of a circular outcome a location effect and an effect on the spread an accuracy effect. The size of the Euclidean norm of μ influences the circular spread. The larger it is, the smaller the spread on the circle. The consequences of this property of μ for the interpretation of results from a PN regression model will be considered later.To estimate PN regression models a Bayesian Markov chain Monte Carlo (MCMC) procedure is used in which r
and B are sampled. The procedure is based on Nuñez‐Antonio et al. (2011) and Hernandez‐Stumpfhauser, Breidt, and van der Woerd (2017), a diffuse normal prior is used for B and the exact method of sampling is described in Appendix A.
Location and accuracy effects
There are several types of circular effects: a location effect, an accuracy effect, a mix of these or no effect. Because PN regression models are consensus models the location and spread of the circular outcome are modelled simultaneously (Rivest et al., 2016). This means that we can create measures for checking whether a predictor has a location or an accuracy effect in a PN regression model. A measure to check for a location effect can be constructed by using a regression line in . Considering one predictor in a PN regression model, predicted values on the first and second bivariate component ( and ) are determined as follows:
where and are intercepts, and are regression coefficients of a particular predictor on the two bivariate components and x is a predictor value. Whether this regression line runs through the origin determines the type of circular effect it represents. In Figure 1 we see two regression lines in with a unit circle. The regression line on the left passes through the origin. The regression line on the right does not pass through the origin. We also see circular predicted values, the grey dots on the unit circle. The circular predicted values lie very close together in the plot on the left, while in the plot on the right they move counterclockwise on the circle when the predictor value increases. The plot on the left represents an accuracy effect; the circular predicted values do not change with the predictor. The plot on the right represents a location effect.
Figure 1
A bivariate regression line with linear predicted values (open dots) for x = (−3, −2, −1, 0, 1, 2, 3) and a unit circle with circular predicted values (closed dots) on the circle for an accuracy (left) and location effect (right).
A bivariate regression line with linear predicted values (open dots) for x = (−3, −2, −1, 0, 1, 2, 3) and a unit circle with circular predicted values (closed dots) on the circle for an accuracy (left) and location effect (right).To assess whether the regression line runs through the origin we first compute the shortest distance (SDO) from the regression line in bivariate space to the origin. The SDO is computed as This is the Euclidean norm of the predictor value of the point where the line of predictions in is closest to the origin (). Because SDO ≥ 0 we give it a sign such that its posterior is defined on the entire real line. We call this new parameter the signed shortest distance to the origin (SSDO). The following equation shows how to determine whether the sign should be positive or negative: The parameter () in this equation is the circular predicted value of . The computation of and is outlined in Section 4.4. The function is defined in equation (4). An intuition for equation (5) is given in the Supporting Information of this paper. An example of how to use the SSDO in practice will be given in Section 5. How well it performs at distinguishing accuracy and location effects is investigated in a simulation study in Section 6.
Quantifying location effects for continuous predictors
In this section we show how to compute circular predicted values and make predicted circular regression curves for a marginal effect. Subsequently we outline new tools for quantifying location effects.
Circular predicted values
Using the two‐argument arctangent function, , we compute predicted values on a circular scale, , as follows: Here and are predicted values on the two components for a vector of predictor values .
Predicted regression curves
To visualize the circular effect, we compute a predicted circular regression curve for a marginal effect. For the marginal effect of one linear predictor with the values of the other predictors set to zero we specify and . We fill out these functions for different values of x and the intercepts and coefficients are estimated. Figure 2 shows a regression curve for one predictor together with original data points. The y‐axis of this plot contains the predicted outcome, , in degrees and the x‐axis contains values for x with a range equal to the data range. This plot illustrates the effect of the predictor on the circular outcome.
Figure 2
Predicted circular regression curve for the relation between a linear predictor and a circular outcome together with the original data points. The square indicates the inflection point of the regression curve.
Predicted circular regression curve for the relation between a linear predictor and a circular outcome together with the original data points. The square indicates the inflection point of the regression curve.By investigating a marginal effect all predictors except one are set to a specific value. In our case they are set to zero. For continuous variables we centre the predictors and therefore zero refers to the mean value. For categorical variables zero refers to the baseline category. As in logistic regression the values to which the other predictors in the model are set influence the marginal effect we observe for the predictor of interest.
A reparametrization for regression models
To quantify the slopes of circular regression curves we propose a reparametrization of equation (5). From this reparametrization we suggest three types of circular regression coefficients. The proposed reparametrization of equation (5) is as follows:Here and are the linear intercepts and and are the linear coefficients of one continuous predictor variable x. The parameters , and describe a predicted circular regression curve, such as the one in Figure 2. The parameters and describe the location of the inflection point of the regression curve on the axis of the circular outcome and the axis of the predictor, respectively. The inflection point occurs at the value of the predictor for which the regression line in is closest to the origin. Hence, is both the predictor value of the point on the regression line in bivariate space that lies closest to the origin as well as the location of the inflection point of the circular regression curve on the axis of the predictor. In Figure 2, the inflection point is indicated by a square. The parameter describes the slope of the tangent line at the inflection point.
Parameter derivation
The x‐coordinate of the inflection point (a
)
To obtain , the derivative of the Euclidean norm of the point where the line of predictions in is closest to the origin is solved for 0. We find This is the location of the inflection point on the axis of the predictor x.
The y‐coordinate of the inflection point ()
To obtain , we insert into equation (6): This is the location of the inflection point on the axis of the predicted circular outcome.
The slope at the inflection point ()
We may solve the reparametrization from equation (6) for to obtainNote that for this is undefined. We can simplify the formula by plugging in : This is the slope of the tangent line at the inflection point and a first circular regression coefficient. This coefficient is conceptually similar to the standard regression coefficient in von Mises regression models (Fisher & Lee, 1992). Like the SSDO, this regression coefficient can be used as an indicator of a location effect. In a simulation study in Section 6 we investigate how these indicators, or SSDO, perform at detecting location effects.
Additional quantification measures
The slope is not necessarily a good measure for all data sets. For some data the inflection point of the regression curve does not lie near the data. In that case, can take on a large range of different values while in the asymptotes the regression curve is still a good approximation to the data. This means that in some cases represents a very unstable extrapolated effect. Then it is much more interesting to investigate the slope of the regression curve near the data. We get the slope at a specific predictor value by taking the derivative of equation (6) for x and plugging in the value for which we want to know the slope: An intuitive value for which we want to know the slope is the mean, . We obtain the slope at the mean (SAM) as where is as computed in equation (10). We interpret this measure as saying that at , a one‐unit increase in x results in a SAM increase in .Another measure that we are interested in is the averaged slope (AS) over each data point. We compute slopes for all data values of x and average these slopes: We interpret this measure as saying that on average, a one‐unit increase in the predictor, x, results in a AS increase in .
Empirical example
To illustrate the problems that occur when interpreting output from a PN model we fit a regression model to a data set collected by Brunyé et al. (2015), the ‘pointing north data’. In their study, 200 Tufts University students divided across 10 data collection sites were asked to point north. Pointing angles relative to the magnetic north (pointing errors) were recorded as the outcome variable and several predictor variables were measured. The outcome variable was measured such that the real north was located at θ = 0°. One of the predictor variables is self‐reported spatial ability. This was measured using the Santa Barbara Sense of Direction questionnaire (SBSOD). Other predictors were age, experience with living on campus and gender (0 = male, 1 = female). Table 1 shows descriptives for these data.
Table 1
Descriptives for the pointing north data with linear mean and standard deviation (SD) for continuous variables and mean direction () and mean resultant length () for circular variables
Mean/θ¯
SD/R¯
Minimum
Maximum
Type
Pointing error
19.57°
0.43
–
–
Circular
Age
19.68
1.29
18.00
23.00
Continuous
Experience
1.79
1.21
0.00
4.00
Continuous
SBSOD
4.10
1.06
1.47
6.80
Continuous
Gender
–
–
–
–
Categorical
M
16.32°
0.56
–
–
F
24.33°
0.32
–
–
Note that only absolute pointing errors are provided online with the original paper. The pointing error used here was obtained by an iterative process multiplying the absolute errors by a random vector containing 1's and −1's and computing the mean resultant length () until a data set was found in which was equal to 0.43, that of the original data. The mean direction was then set to 19.57, the mean direction of the original data.
Descriptives for the pointing north data with linear mean and standard deviation (SD) for continuous variables and mean direction () and mean resultant length () for circular variablesNote that only absolute pointing errors are provided online with the original paper. The pointing error used here was obtained by an iterative process multiplying the absolute errors by a random vector containing 1's and −1's and computing the mean resultant length () until a data set was found in which was equal to 0.43, that of the original data. The mean direction was then set to 19.57, the mean direction of the original data.Before analysis all continuous predictor variables were centred at 0. This affects the intercept but not the coefficients and will make the interpretation of individual effects more intuitive. When using the embedding approach we are predicting the mean vector of the PN distribution for the pointing error. The prediction equations for the pointing north data are: Table 2 shows estimates from the regression model that was fitted after careful monitoring of convergence (iterations = 3,000, burn‐in = 1,000). Appendix B shows histograms of the posterior distributions of the linear intercepts and coefficients. The interpretation of the regression coefficients is the same as in linear regression. For SBSOD, if the self‐reported spatial ability increases by one unit, the predicted outcome on the first bivariate component increases by 0.25 and the predicted outcome on the second bivariate component increases by 0.27. For the categorical variable, gender, we may interpret the coefficients as follows: for females the predicted outcome on the first bivariate component is 0.48 lower than for males. Because our circular outcome concerns compass data the only interpretation we can give to the bivariate components is that of a north–south and an east–west axis. The coefficients do not give us information on the change in the predicted pointing error on the circle, a circular effect, nor do they give us information on whether it is a location or accuracy effect. Next, we compute and interpret the circular regression coefficients introduced in Section 4 and the SSDO for the pointing north data.
Table 2
Posterior modes and 95% highest posterior density lower (LB) and upper bounds (UB) for the regression coefficients of the pointing north data. An asterisk (*) indicates that a highest posterior density interval does not contain 0
Component I
Component II
Mode I
LB I
UB I
Mode II
LB II
UB II
Intercept
0.95
0.72
1.20*
0.15
−0.07
0.41
Age
−0.07
−0.29
0.14
0.00
−0.23
0.21
Gender
−0.48
−0.73
−0.10*
0.17
−0.18
0.47
Experience
0.22
−0.00
0.45
−0.02
−0.23
0.23
SBSOD
0.25
0.09
0.40*
0.27
0.07
0.38*
Posterior modes and 95% highest posterior density lower (LB) and upper bounds (UB) for the regression coefficients of the pointing north data. An asterisk (*) indicates that a highest posterior density interval does not contain 0
Circular effects
Before interpreting the circular regression coefficients we use the linear coefficients to determine whether there is any location or accuracy effect of a predictor on the pointing error. To do so we use the linear regression coefficients for a predictor (e.g., and ). If one or both regression coefficients of the two components are different from 0 there is an accuracy and/or location effect. To check whether this holds we can use the highest posterior density (HPD) intervals of both linear coefficients. If they do not both contain zero we conclude that there is an effect of the predictor on the circle. From Table 2 we may thus conclude that only SBSOD and gender have an effect on the circle.For the categorical variable gender we can compare the predicted angle for females, , and for males, . Females thus have a higher pointing error than males on average. If we compute such predicted angles for each iteration of the MCMC sampler we can also compute HPD intervals for this effect.For the continuous variables of the pointing north data, Table 3 shows the posterior modes (PM) and the upper (UB) and lower bounds (LB) of the HPD intervals for the circular coefficients and the SSDO. We use the SSDO to determine whether the effect of SBSOD on the pointing error is an accuracy or a location effect. Again we use the HPD interval. If the HPD interval of the SSDO of a predictor does not include 0 we conclude that the bivariate regression line does not run through the origin and that there is a location effect. For SBSOD the HPD interval does not include zero. This means that the effect of SBSOD on the pointing error is a location effect.
Table 3
Posterior modes (PM) and 95% HPD interval lower (LB) and upper bounds (UB) for the circular regression coefficients and SSDO for the continuous variables of the pointing north data. An asterisk (*) indicates that an HPD interval does not contain 0
Age
Experience
SBSOD
PM
LB
UB
PM
LB
UB
PM
LB
UB
bc
0.16
−2.57
2.80
−0.35
−7.04
6.50
0.52
0.09
3.48*
SAM
−0.02
−0.22
0.24
−0.03
−0.31
0.18
0.17
0.00
0.40*
AS
0.05
−0.21
0.26
−0.10
−0.36
0.27
0.23
0.01
0.42*
SSDO
−0.76
−1.06
1.00
1.30
−1.87
2.07
−1.08
−2.11
−0.59*
Posterior modes (PM) and 95% HPD interval lower (LB) and upper bounds (UB) for the circular regression coefficients and SSDO for the continuous variables of the pointing north data. An asterisk (*) indicates that an HPD interval does not contain 0Next, we investigate the circular regression coefficients for SBSOD shown in Table 3. For SBSOD, all three circular coefficients have HPD intervals that do not contain 0. This means that SBSOD has a location effect and is what we expected after concluding that the SSDO for SBSOD was different from zero. Figure 3 shows the relation between SBSOD and the pointing error. The posterior mode of the slope at the inflection point of the predicted circular regression curve, , for SBSOD is equal to 0.52. This means that at the inflection point, as SBSOD increases by one unit the pointing error increases counterclockwise, by 0.52 · 180/π ≈ 29.8°, keeping all other predictors at zero. However, the inflection point lies almost outside the data range so we would rather interpret the AS or SAM. On average an increase of one unit in SBSOD results in a counterclockwise increase in pointing error of , keeping all other predictors at zero. At the average SBSOD an increase of one unit results in a counterclockwise increase in pointing error of , keeping all other predictors at zero.
Figure 3
Predicted circular regression curve for the relation between SBSOD and the pointing error, together with the original data points. The square indicates the inflection point of the regression curve.
Predicted circular regression curve for the relation between SBSOD and the pointing error, together with the original data points. The square indicates the inflection point of the regression curve.Even though they do not have a circular effect, it is interesting to look at the parameter estimates of age and experience. The results show estimation issues for both predictors, exemplified by the wide HPD intervals of . When we look at the posterior histograms of the circular regression coefficients and SSDO for age and experience in Figure B2 in Appendix B we also see estimation issues. Whereas the histogram of shows that there are some posterior estimates with extreme positive or negative values, the histograms for AS and SAM do not. These extreme values are probably the cause of the wide HPD intervals. Additionally, such estimation issues may occur when we try to estimate a location effect in a situation where there is no or just a very small location effect. Judging from the data plotted along the predicted regression curve in Figure 4, this may well be the case for the experience variable. In Section 6 we will illustrate what happens when we try to estimate a location effect in data where there is no or almost no location effect. Note, however, that experience does seem to influence the spread of the pointing error. Figure 5 shows the relation between experience and the concentration, which is the reciprocal of the spread, of the predicted values on the circle. The concentration was computed using the formula for the mean resultant length from Kendall (1974). In the Supporting Information predicted circular regression plots as well as figures showing the effect on the concentration are provided for all continuous variables in the pointing north data.
Figure B2
Posterior histograms for the circular regression coefficients and the SSDO for the age (left) and experience (right) variables of the pointing north data.
Figure 4
Predicted circular regression curve for the relation between experience and pointing error, together with the original data points. The inflection point of the regression curve is not shown as this point lies outside the range of the data.
Figure 5
Relation between experience and the concentration of the predicted values on the circle.
Predicted circular regression curve for the relation between experience and pointing error, together with the original data points. The inflection point of the regression curve is not shown as this point lies outside the range of the data.Relation between experience and the concentration of the predicted values on the circle.
Simulation study
To assess the performance of the circular coefficients , SAM and AS and the ability to distinguish between location and accuracy effects we conducted a simulation study with 1,225 designs with one predictor. Of these designs 1,056 were classified as location designs, 144 as accuracy and 25 as having no effect. Because the last category is so small its results are excluded from the simulation study. A description of the designs is given in Section 6.1. In Section 6.2 a summary of the simulation results is given, and in Section 6.3 we try to explain the causes of patterns observed in these results. A more detailed description of the simulation study and the results can be found in the Supporting Information for this paper.
Design
In each design different population values were chosen for the linear intercepts and and the regression coefficients and . From these values, the population values of the parameters a
, a
, b
, SAM, AS, SDO and SSDO were computed. For each design 1,000 data sets were simulated, 500 with N = 50 and 500 with N = 200. Each data set contains one circular outcome θ and one linear predictor . The relation between predictor and outcome was determined by the chosen values for the linear intercepts and coefficients. Before analysis of a data set the linear predictor was centred at 0. After analysis the relative bias, frequentist coverage of the HPD interval and average interval width (AIW) of the estimates for b
, SAM, AS and SSDO for each design were computed.
Results
In this section we briefly summarize the results from the simulation study with regard to how well we can detect location and accuracy effects and the performance of the MCMC sampler in estimating the circular coefficients. We will especially focus on the designs in which the measures did not perform optimally.If there is any effect this can be found in more than 90% of the data sets of a design. This holds in all design categories, location and/or accuracy. The indicators for location effects that were tested, and SSDO, work well for accuracy and location effects with . For location effects with the indicators perform worse. All indicators perform better in designs with a larger sample size.Concerning the performance of the MCMC sampler in estimating the circular regression coefficients we can say that designs with a larger sample size and larger SSDO in most cases perform better in terms of relative bias, coverage and AIW. In accuracy designs the coefficients have smaller relative biases compared to location designs with an SSDO close to 0. In general AS and SAM have lower relative bias than . In terms of coverage the AS shows slight undercoverage. The SAM has slight undercoverage in location designs and overcoverage in accuracy designs. The log AIW is largest for accuracy designs and the parameter . This seems to correspond to the estimates for the example data.
Explaining patterns
To explain the patterns in relative bias, coverage and log AIW we will show what happens in the estimation of circular regression coefficients of an exemplary design. The exemplary design is a location design with an SSDO of 0.24 and a sample size of 50. The results for this design are summarized in Table 4.
Table 4
Simulation results of 500 simulated data sets () from a population with , , and
Parameter
Population value
Posterior mode
Bias
LB
UB
Coverage
ax
−1.53
−1.46
−0.07
−1.91
−1.17
0.94
ac
1.82
1.80
0.02
−1.28
1.89
0.96
bc
−8.50
−2.45
−6.05
−72.62
65.01
0.95
AS
−0.41
−0.29
−0.12
−0.75
0.45
0.95
SAM
−0.05
−0.05
0.00
−0.17
0.06
0.96
SSDO
0.24
0.27
−0.02
−0.17
0.81
0.95
LB and UB refer to the averaged lower and upper bounds of the 95% HPD intervals.
Simulation results of 500 simulated data sets () from a population with , , andLB and UB refer to the averaged lower and upper bounds of the 95% HPD intervals.Figure 6 shows histograms for the posterior modes for , , AS and SAM for all 500 data sets. Notice that the histograms for , and AS are bimodal. The bimodality is caused by the estimated that switches to the other side of the circle. How often switches sides is determined by the SDO. When the SDO is zero, in an accuracy design, is equally likely to switch to either side of the circle and the histograms of modes will all be bimodal and symmetrical. When the SDO is large, the will almost never switch to the opposite side of the circle and the histograms of modes will all be unimodal and symmetrical. In both cases the symmetry of the histograms of modes around the true value results in little bias in the estimate of and AS. In designs with a small SDO the histogram of modes is bimodal but not symmetrical. This causes bias in and AS. The bimodality problem does not occur in the SAM, which explains the lower relative bias. Because of this property and its interpretation we prefer the SAM over and AS.
Figure 6
Histograms of the posterior modes of 500 simulated data sets (N = 50) from a population with , , and for the parameters a
, b
, AS and SAM.
Histograms of the posterior modes of 500 simulated data sets (N = 50) from a population with , , and for the parameters a
, b
, AS and SAM.In Section 5 we observed that for some iterations of the MCMC sampler the estimated is either extremely negative or extremely positive. The extremes are caused by the tangent in the formula for in equations (9) and (10). The tangent function has asymptotes at 0.5π radians and at radians. If or is close to either one of these asymptotes we get extreme estimates for . These extremes cause the AIWs to be large, as can be seen in Table 4. Large AIWs cause the coverage of the designs with small SDOs to be as good as or better than in designs with larger SDOs and lower the ability to detect location effects in the designs with smaller SDOs.
Discussion
The main contribution of this paper is to simplify the interpretation of effects in projected normal regression models. In the previous literature only the bivariate coefficients for a predictor were given, without much indication of how to properly interpret these. Therefore, we have developed methods for assessing circular effects. These methods allow us to interpret and quantify the effect of a predictor on a circular outcome. A simulation study has shown that the performance of the methods is good in designs with an easily detectable location effect. If the location effect is harder to detect, the performance of the methods worsens. We need to increase the sample size to get more power and better performance. The slope at the mean (SAM) has an intuitive interpretation and performs best. The performance of the other circular coefficients seems to depend on the type of effect they are computed for. Therefore we recommend researchers to use the SAM and carefully investigate the circular regression plots and posterior histograms before using or AS.Additionally, we have investigated the ability of our method to detect different types of circular effects. If there is any effect it can usually be picked up by the linear coefficients. The coefficient and SSDO perform equally well at detecting location effects. Assessing whether there is a location effect is harder in smaller samples, especially if it is a location effect with a small SDO. It is recommended that researchers make sure they have a large enough sample size to be able to detect the effect they are interested in. To be able to precisely say what sample size is needed more research needs to be done. From the present research we conclude that for a PN regression model with one continuous linear predictor a sample of 200 is large enough to be able to detect most location effects with small SDO.The ability of the PN regression model to by default detect an effect, on the mean or spread, is an advantage over regression models of the intrinsic or wrapping approach to circular data because we do not need to fit two separate models. The model we use allows for investigation of the posterior distributions of the linear coefficients to check whether a location or accuracy effect is present in the data.Although this paper focuses on regression models for the PN situation, we may consider using the tools introduced here in more complex models or in models using the general projected normal (GPN) distribution. One possibility is to use them in models where the mean of the PN distribution is partly composed of basis functions of the covariates (e.g., polynomials). Because locally polynomials look like a straight line, our tools might be applied here as well. Another example of a more complex model is the mixed‐effects model that Nuñez‐Antonio and Gutiérrez‐Peña (2014) propose. In this model all tools introduced in this paper can be used on fixed effects. For random effects we can also consider these tools, but we have to be cautious in interpreting circular random effects. Their interpretation depends on the point relative to which the circular random effects are computed; the individual random intercepts or the average intercept. We are in this case also interested in the spread of the random effects. To assess the spread new tools will have to be developed. In GPN models the tools introduced here will be more complex to interpret and to implement. For skewed data interpretation problems could be overcome, but for bimodal data we would, for example, have to choose at what mode of the data we want to compute the SAM. In a regression model this seems redundant as we would usually try to explain possible bimodality by including predictors in the model (e.g., having different means for men and women). This can already be done in a PN model. Tools for a GPN model would also be hard to implement. There is no analytical solution for obtaining the mean direction of GPN models. This complicates the computation of circular predicted values. It is possible to get these using Monte Carlo integration (Wang & Gelfand, 2014), and we may also be able to compute the slope of a circular regression curve and the tools proposed in this paper in a similar way. However, the behaviour of these regression coefficients should then be investigated thoroughly, especially their behaviour and use in a model where the GPN distribution is bimodal and/or skewed.In conclusion, this paper has contributed to our knowledge about interpreting effects in projected normal regression models. We have outlined how to assess whether a predictor has an effect on either the spread or the mean of the circular outcome. But most importantly we have found a way of quantifying an effect on the mean of a circular outcome. These methods allow us to directly assess the effect of a predictor on the circle. In our opinion this has removed the major drawback of the projected normal regression model.Appendix S1. Circular interpretation of projected normal regression coefficients: Supplementary material.Click here for additional data file.
Authors: Sigal Zilcha-Mano; Kevin S McCarthy; Ulrike Dinger; Dianne L Chambless; Barbara L Milrod; Lauren Kunik; Jacques P Barber Journal: J Consult Clin Psychol Date: 2015-06-01
Authors: Roxanne S Beltran; Jessica M Kendall-Bar; Enrico Pirotta; Taiki Adachi; Yasuhiko Naito; Akinori Takahashi; Jolien Cremers; Patrick W Robinson; Daniel E Crocker; Daniel P Costa Journal: Sci Adv Date: 2021-03-17 Impact factor: 14.136
Authors: Marta Suárez-González; Carlos Bousoño-García; Santiago Jiménez-Treviño; Juan José Díaz-Martín Journal: Nutrients Date: 2021-03-28 Impact factor: 5.717