Literature DB >> 35707119

Dynamic Bayesian adjustment of anticipatory covariates in retrospective data: application to the effect of education on divorce risk.

Parfait Munezero¹, Gebrenegus Ghilagaber¹.

Abstract

We address a problem in inference from retrospective studies where the value of a variable is measured at the date of the survey but is used as covariate to events that have occurred long before the survey. This causes problem because the value of the current-date (anticipatory) covariate does not follow the temporal order of events. We propose a dynamic Bayesian approach for modelling jointly the anticipatory covariate and the event of interest, and allowing the effects of the anticipatory covariate to vary over time. The issues are illustrated with data on the effects of education attained by the survey-time on divorce risks among Swedish men. The overall results show that failure to adjust for the anticipatory nature of education leads to elevated relative risks of divorce across educational levels. The results are partially in accordance with previous findings based on analyses of the same data set. More importantly, our findings provide new insights in that the bias due to anticipatory covariates varies over marriage duration.

Entities: Chemical

Keywords: Anticipatory covariates; Bayesian inference; Sweden; current-date covariates; dynamic modelling; educational gradients in divorce-risks; observational studies; particle filter; retrospective surveys

Year: 2020 PMID： 35707119 PMCID： PMC9041697 DOI： 10.1080/02664763.2020.1864812

Source DB: PubMed Journal: J Appl Stat ISSN： 0266-4763 Impact factor: 1.416

Introduction

Anticipatory (current-date) covariates are variables whose values refer to what is attained by the date of the survey but are used to explain behaviour in life course which took place before the survey. Highest educational level and social class attained at survey time are typical examples of anticipatory covariates. Such variables are common in many retrospective studies because the data collection focuses on, say, birth or employment histories but contain no history on educational careers or social class mobility. The use of anticipatory covariates causes problems in inference because they are treated as fixed variables although, in fact, they are inherently time-varying. Consider, for instance, a retrospective survey where the interest is to investigate differentials in the risk of divorce across educational levels attained before marriage but the available information is only respondents' highest educational level at the time of the survey. We have information on the age of individuals and the year at which they achieved the reported highest educational level. But, we have no idea of their educational level at the time of marriage on which we want to base our investigation. At least some should have had lower educational level at the time of marriage than what they have reported at the time of the survey but we have no idea of how much lower it was. The main goal of this paper is to propose a method of restoring the temporal order of the educational level and, thereby, disentangle the effects due to misclassification of the educational level into wrong categories from those that are attributable to real differences in educational levels. The two recent studies that attempted to adjust for the anticipatory nature of educational level [9,10] are based on the assumption of proportional risks (constant relative risks) of divorce for educational levels across marriage durations. In the present study, we propose a dynamic Bayesian adjustment model that (i) allows covariate effects to vary over time and (ii) adjusts for the bias due to misclassification of anticipatory educational levels. Our proposed methodology allows for modelling jointly the risk of divorce and the process of educational level transitions. We use a reversed time Markov chain process to estimate the most probable educational level before exposure to the event of interest. These estimated values are, then, used to compute adjusted effects of the covariate on the event under investigation. Results show that ignoring the anticipatory nature of education leads to increased educational gradients in divorce risks with significant difference between those with secondary and tertiary education. But, such difference disappears when we adjust for the anticipatory nature of education and we find no significant differences in divorce risks across educational levels. In Section 2, we introduce the data set and the specific problem to be addressed. In Section 3, we present the proposed dynamic Bayesian adjustment model. In Section 4, we fit the model, and present our results and compare them with those from previous analyses of the same data set. We summarise our findings by way of discussion in Section 5.

The data set and our specific problem

The data set

The data set on which we base our illustration is a subset from a Mail Survey of Swedish men where data on attitudes towards children and family was collected from about 3200 men [17]. Apart from attitudes and family plans the survey also collected other information such as parental background as well as respondents' highest educational level attained at the time of the survey. We have chosen this data for our illustration because it was analysed in previous studies [9,10] with which we want to compare our results. The usable records for our present case are 1312 ever married men who have either divorced before the survey date (events) or were still married (censored). The variable of interest for our present purpose is educational level and its effects on the risk of divorce. Table 1 presents the data classified by whether the individual completed his reported educational level before he married (the non-anticipatory cases) or after he married (anticipatory cases).

Table 1.

Distribution of sample across when education was completed.

Completed	Education		Status
		Still married	Divorced	Total	% divorced
Before marriage	Primary	371	71	442	16
(non-anticip.)	Secondary	433	55	488	11
	Post Secon.	116	21	137	15
	Sub-total	920	147	1067	14
After marriage	Primary	–	–	–	–
(anticipatory)	Secondary	66	28	94	30
	Post-Secon.	120	31	151	21
	Sub-total	186	59	245	24
	Total	1106	206	1312	16

Distribution of sample across when education was completed. Thus 245 of the 1312 respondents (close to ) are anticipatory cases in the sense that they completed their highest reported educational level after they married. These are also shown to the right of the diagonal line in Figure 1. As would be expected, all of those who reported to have primary level education at the time of the survey have completed this level before they married (indicated by blue ‘+’ in Figure 1). For those who reported to have secondary or post-secondary education at the survey time, on the other hand, some have attained the reported educational level after they married. The exact figures are that of those with secondary level education (the green ‘o’ in Figure 1), and of those with post-secondary education (the red ‘x’ in Figure 1) are anticipatory cases.

Figure 1.

Distribution of the data by age at marriage and at completion of reported education. The observations below the main diagonal are anticipatory cases.

Distribution of the data by age at marriage and at completion of reported education. The observations below the main diagonal are anticipatory cases. Overall, 442 of the 1312 respondents ( ) have reported primary level education, 582 ( ) have reported they have secondary level education, while the rest 288 ( ) have reported to have achieved post-secondary-level education by the survey time. It can also be noted that out of the entire sample of 1312 respondents (who had at least primary level education by the survey time), 870 ( ) progressed to secondary level education. Of these, 288 ( ) progressed further to post-secondary level education. These figures are relevant for comparison with our estimated transition rates between educational levels to be presented in Section 4.2.

The specific problem

Our specific problem is that some individuals have achieved their reported highest educational level after marriage and, in fact, a smaller proportion have completed the reported educational level after they divorced. The question that remains to answer is what to do with these respondents whose educational level does not follow the temporal order of the event of interest. The simplest option is to discard them from further analyses as in [2]. We label this approach as the reduced model in the following sections but argue that it leads to loss of information and bias due to potential selection. Another option, which is the common practice in the literature, is to ‘blind oneself’ and proceed with analyses of the entire data set ignoring the anticipatory nature of the covariate. We call this alternative the anticipatory model in subsequent sections. We also argue that even this approach leads to biased estimates of educational gradients in divorce because potential changes in educational levels (between exposure to the risk and the event or censoring) are ignored. Thus we propose a model that uses a reversed time Markov chain process to estimate the most probable educational level before exposure to the risk of divorce. These estimated values are then used to compute adjusted effects of education on divorce risk. This approach is what we call the adjusted model in the rest of the paper. Other methods that require data imputation methods include [14-16,18] while [5,9,10] are examples of approaches based on joint modelling of the risk and the education process.

Dynamic Bayesian adjustment of anticipatory covariates

Dynamic hazards model

We observe data , where represents the observed marriage duration for individual i which is either event time (years between marriage and divorce) or a censoring time (years between marriage and the survey date). The censoring indicator, takes the value 1 if the ith respondent was divorced by the survey time and the value 0 if he was still married (censored). The is the highest education level reported at the time of the survey and have one of three levels (1 for primary level, 2 for secondary level and 3 for post-secondary level). The and are, respectively, the dates the ith respondent married and completed his highest reported education. We also introduce a variable Z which is the educational level achieved by the date of marriage. In the reduced and anticipatory models, Z = x, while in the adjusted model, Z is a latent variable whose value needs to be estimated together with other parameters of interest. If Z is known and fixed, the marriage duration is modelled by assuming a continuous rate of divorce, where , are functions of time representing, respectively, the baseline hazards and the effects of education levels on divorce risk at marriage duration t, if , and otherwise, for l = 1, 3. Here, secondary (l = 2) is set as the reference level. One common approach of parametrizing time-varying covariate's effects is the semi-parametric piece-wise exponential (PE) model which expresses them using a Gaussian random walk process [6,12,23]. This is a special case of the spline representation described in [4,13,22]. The PE model partitions the time interval (in our case, marriage duration) into smaller intervals, , , where , and assumes that effect parameters and baseline hazard functions are constant within each interval but can vary between intervals. Thus, for our case, we define (where ) and for . The continuous hazard function in Equation (1) is, therefore, discretized in terms of interval-specific hazards in the log-linear form: where , and is the new regression parameter-vector within interval and contains an intercept . The PE assumption allows to express the likelihood function in interval as a product of the contributions of individuals in that interval [6]: where is defined in Equation (2), is the number of men at risk in (who are still married at the end of ), , is an interval-specific censoring indicator ( if event occurs in and otherwise), and are, respectively, the vectors of education profiles at marriage date and exposure times of men at risk in . As we can note from Equation (4) the likelihood functions in each interval are direct functions of the number of events and the exposure times . Thus misclassification of the events and/or exposure times into wrong marriage intervals or, most importantly, into wrong levels of the covariate – as is the case with our anticipatory education – will lead to incorrect estimates of the parameters. This, in turn, can potentially ruin the purpose of the analysis. Further, a plot of life-table estimates of the hazards of divorce for the three educational levels (see the supplemental material) shows that the hazards cross each other across marriage duration – thereby violating the proportional hazards assumption. Below, we propose a method that both relaxes the proportional hazards assumption and adjusts for the anticipatory nature of the educational level.

Joint likelihood for the marriage duration and the latent educational level

To adjust for the anticipatory education level, we express the joint likelihood of marriage duration, T, and the latent education level achieved by marriage date, Z, for respondent i as Here, is the conditional likelihood function (4), and models the latent education level achieved by the date of marriage given the highest education level reported at the survey time. To model Z, we define the probability that the level of education of respondent i at his marriage date was l given that he reported an education level at the survey time, as a function of the time between the date of marriage and the date of completion of the reported highest educational level : We model Z by tracing backward in time the paths of education level progress taking into account that education is a non-decreasing process. This means, given the highest education level x = k reported at the survey time, the plausible education level that could have been attained by marriage time is . Thus can be expressed in terms of the transition probability matrix for k = 1, 2, 3 (rows) and l = 1, 2, 3 (columns). Note that the probabilities sum to 1 in each row and that the primary education-level is an absorbing state because, based on evidence from the data, all individuals who reported to have primary-level education at the survey time have completed it before they married. For a given row in Equation (7), Z follows a categorical distribution where are indicators of the education levels ( if and otherwise).

Estimation of the transition probabilities

We assume that x and Z are realizations from a continuous Markov chain , where and are, respectively, the birth date and the survey date, and is defined on an outcome space of three states, representing the three levels of education. This means and where, as defined before, and denote the date at which the reported highest education level was attained and the date of marriage, respectively. Because x, and are observed, we only need to focus on the paths of the chain in the time frame to make inference about Z. Since (for the anticipatory cases), and by definition , it follows that Z can be obtained from reversing all paths of the chain passing by the state at time . This induces a reversed Markov chain with initial state and final state which is the most plausible educational level achieved by the date of marriage. Note that the time frame can be translated to , which allows to express the reversed Markov chain as implying that and . Assuming it is not possible to skip an education level, transitions in can only occur between two consecutive states in a decreasing way as shown in the diagram below. The process can begin either in or and it is assumed that during an infinitesimal small time , reversed transitions from the state to the state occur at a rate of , while those from to (the absorbing state) occur at a rate of . Here, the rate is defined as the number of men completing the educational level in question per unit time. Therefore, both and are defined on . The Markovian assumption implies that the probability of being in the state at a time depends only on the current state at time s, which can be expressed as where is the probability of being in the state at a specified time, and is the transition probability from the state k to the state l during an infinitesimal small time . In matrix notation, the expression in Equation (9) can be written as where is a matrix of transition probabilities , and is a vector of the marginal probabilities of the states at a specified time. In order to complete the adjusted model, we need to define the transition probabilities . We define them considering two facts: (1) transitions can only occur between consecutive states and (2) the states are not instantaneous, which means that before a transition occurs the process stays in the current state for some time. These probabilities can be modelled, following Gross et al. [11], as These can then be used to estimate the conditional probabilities defined in Equation (7) as follows (see the supplemental material for derivation and more details on how these transition probabilities are estimated). Thus, our final adjusted model is the joint likelihood, where are the estimates in Equation (12) using , and if the reported highest education level is and otherwise. In the adjusted model, Z can either be the reported highest level or a lower level depending on the value of s. For instance, if a respondent reported post-secondary education at survey but we see that this was attained 2 years after marriage (s = 2), then his educational level at marriage date would be either post-secondary or secondary. The probability of primary, in this case, would be nearly zero as it requires more than 2 years to complete both secondary and post-secondary levels. The adjusted model in Equation (13) can be seen as a generalization of the reduced and anticipatory models since they can be obtained as special cases by setting in Equation (13).

The prior distribution

To complete specification of the adjusted model, we specify prior distributions for the effect parameter paths and the transition rates μ. To insure smoothness in the regression parameters across different intervals , we assume a prior of the form where is the Gaussian random walk, and is the variance that controls the evolution of the effect parameters through the intervals. Often, is considered constant throughout the intervals and a prior is set on it [7,12,21,23]. It is also possible to estimate by using the discount procedure in [24]. We follow such procedure in this paper where we assume is proportional to the covariance matrix, of the posterior of the parameter in the preceding interval : where is the discounting factor regulating the amount of information transferred from the previous to the current interval. As , then , leading the prior (15) to become non-informative; hence no communication among intervals. On the other hand, as , then , which leads to a static evolution of the effect parameters. Here, we set , following [20], to allow the effect parameters evolve smoothly and adapt to any local change that may occur in the hazard function. Further, we set the initial distribution to a non-informative normal distribution , where is a three-dimensional unit matrix, to express lack of information in the first interval. Finally, given that μ is defined on the positive real line, we propose independent gamma priors for the transition rates : where and are positive real numbers which we set to: . In this way, we assume that a priori the rates are nearly zero and let the data dictate the optimal rates. Implicitly, the prior (16) suggests that we start a priori with the anticipatory model as and are close to zero and Z is close to the highest reported educational level.

The posterior distribution

The joint posterior distribution for the parameters in the adjusted model can be expressed as where , denotes the observed marriage duration, and represent, respectively, the educational level at marriage time and the reported highest educational level for all n individuals. The parameter is high-dimensional (where the dimension grows with J); therefore, estimating it jointly may be computationally inefficient. One possibility to address this issue is to estimate the marginal posterior of , commonly referred to as the filtering distribution, recursively through time as follows: where , , , are, respectively, vectors of educational levels and exposure times of individual who are at risk in , and The conditional posterior of the rate parameter, μ, can be expressed as where is the joint prior of μ, and are the transition probabilities defined in Equation (7) and estimated in Equation (12). Lastly, the conditional posterior of the latent parameter Z can be expressed as

Inference

Since the reduced and anticipatory models are nested within the adjusted model, we discuss only the method of sampling from the adjusted model. The same procedure applies to the former models but with some non-applicable steps skipped. To sample from the joint posterior distribution of the adjusted model, we apply a Gibbs sampler that follows three steps at each iteration Sample Sample Sample Starting from an initial path of the effect parameters and an initial education level , we sample the vector of the rates from its conditional posterior distribution given in Equation (20). Then, for each individual, we sample an education level from the categorical distribution in Equation (8) with probability proportional to the expression in Equation (21). Finally, we sample a path from the conditional posterior in Equation (18). The first and last steps cannot be implemented directly because the expressions in Equations (18) and (20) are not tractable. We replace the first step by a Metropolis–Hastings kernel. Thus, at the mth iteration, is proposed from a proposal distribution and it is accepted with probability proportional to Since the rates are defined on the positive real line, we propose and independently from the gamma distribution. To select the parameter values in the proposal distribution, we use the observed mean and variance of s for each educational level (in the anticipatory cases) as guidance. According to (12), the distribution of s is implicitly an exponential distribution, which implies that the mean of s is inversely related to the rate μ. The parameters in the proposal distribution are, therefore, obtained by matching the observed mean and variance of s computed from the data with the corresponding moments of an inverse-gamma distribution. The observed mean and variance are 4.28 and 11.55 respectively, for the secondary level, and 4.19 and 7.12, for post-secondary level. Hence, the proposal distribution g is set to independent and for and , respectively. The final step requires evaluating the integral (19) which, for our model, is intractable. In addition, the autocorrelations of the effect parameters induced by the random walk prior process (15) may hinder the convergence of the sampler. To overcome these issues, Andrieu et al. [1] suggest using a conditional sequential Monte Carlo kernel (commonly known as particle filter) to approximate the conditional posterior in Equation (18). Particle filters approximate empirically by a discrete distribution defined on a finite set of points , referred to as particles, with probability masses commonly known as importance weights. With this weighted sample of particles, the integral (19) can be approximated as Particle filter algorithms provide a framework of computing the importance weights recursively as new data from intervals are observed, and they enable efficient and computationally fast inference in dynamic models. A comparison of the particle filter and the Markov Chain Monte Carlo (MCMC) algorithms applied to survival dynamic models is provided in [20]. The conditional sequential Monte Carlo kernel of Andrieu et al. [1] runs the particle filter with the condition that one reference path is a priori set deterministically and after a complete run of the particle filter, one path is selected from the sample of particles . However, setting the reference path deterministically may lead to a slow mixing of the sampler because particle filters are prone to degeneracy in the importance weights. This problem occurs when a few particles have significantly high importance weights but the rest have importance weights close to zero. To overcome this problem, Lindsten et al. [19] suggest randomizing the reference path through an extra resampling step referred to as ancestor sampling. The ancestor sampling procedure allows fast mixing of the sampler and does not require many particles in the underlying particle filter step [19]. We, therefore, follow the particle Gibbs with ancestor sampling (PGAS) of Lindsten et al. [19] and apply the particle filter algorithm of Munezero [20]. This algorithm was designed specifically for survival data and proven to be computationally fast and efficient. In our illustrative example, we achieved fast convergence by setting the number of particles H to 150. See supplemental material for details on convergence diagnostics and posterior predictive checks.

Analysis of effect of education on divorce risk

Preliminary analyses using Cox proportional hazards (PH) model

To begin with, we fitted a standard Cox proportional hazards (PH) model [3] of the form to the entire data as well as separately to non-anticipatory and anticipatory cases (those on the left and right of the main diagonal in Figure 1, respectively), with results shown in Table 2. Thus using those with secondary level as a baseline (reference) category, we note, for the entire sample, that those with primary level education have about lower risk while those with post-secondary level have about higher risk of divorce. In other words, divorce risk increases as educational level increases though the p-values indicate that the results are not significant at significance levels of about .

Table 2.

Relative risks of divorce by educational level (from Cox PH model).

Education	All (n=1312)	Non-Anticip. (n=1067)	Anticip. (n=245)
Primary	0.897 (p=0.502)	1.050 (p=0.074)	-
Secondary	1	1	1
Post-secon.	1.194 (p=0.314)	1.596 (p=0.068)	0.693 (p=0.160)

Relative risks of divorce by educational level (from Cox PH model). For those who completed their reported educational level before marriage (the non-anticipatory cases to the left of the diagonal in Figure 1) the corresponding excess risks of divorce are for those with primary level education and about for those with post-secondary education. Further, the results are not significant at significance levels of about . In contrast, the results for the anticipatory cases are in the opposite direction – those with post-secondary education have about lower risk of divorce than those with secondary education. Again, the corresponding p-value shows the difference is not statistically significant at significance levels of about . These preliminary results give an early warning that educational gradients in divorce risks differ between the anticipatory and non-anticipatory cases which, in turn, prompts to our proposed adjusted model.

Results from the adjusted model and comparison with previous findings

Our proposed model requires defining the J intervals partitioning the marriage duration. Common practice sets the interval limits at each event time [6]. However, doing this would result in J = 151 intervals which requires huge computational efforts. To alleviate this, Munezero [20] proposes setting the intervals at event times in such a way that all intervals contain the same number of events. Inference about the optimal number of events per interval, E, can be done using any predictive information criteria. Following [20], we use the Watanabe–Akaike information criterion, WAIC (see [8] for the definition and details about WAIC). We obtain the following WAIC values: 2209.28, 2205.49, 2206.47 and 2208.15, respectively, for E = 5, E = 10, E = 15 and E = 20. These WAIC values suggest to use E = 10 which leads to a partition of time into J = 15 intervals. In Figure 2, we present the estimated probabilities of transition for different educational levels. It turns out that a respondent (among anticipatory cases) who reported post-secondary educational level is most likely to have had secondary level at his marriage date, if he spent at least 2–4 years of studies after marriage. However, if he spent at least 4 years studying, it is more likely that he had completed two educational levels; that is he had primary level at his marriage date. For those who reported secondary educational level, the probability that they had primary educational level at their marriage date becomes considerably high after 2.5 years of study after marriage. The estimated mean times to complete secondary and post-secondary education are years and years, respectively.

Figure 2.

Estimated conditional probabilities of educational levels at marriage time for given educational levels at survey time

Estimated conditional probabilities of educational levels at marriage time for given educational levels at survey time From the estimated transition rates, we can also compute and which are estimates of the proportions (among the anticipatory cases) with secondary and post-secondary level educations, respectively. These figures can be compared with the corresponding entries in Table 1 ( and , respectively). We use these quantities in our posterior predictive checks, see the supplemental material. Figure 3 shows relative risks of divorce for those with primary level education (relative to those with secondary level education) by marriage durations and across the three models (reduced model, anticipatory model and adjusted model). The corresponding relative risks for those with post-secondary education are shown in Figure 4. We note from the figures that the relative risks are underestimated in the anticipatory model (red line) compared to the reduced model and adjusted model (black and blue, respectively) in most of the marriage durations. We also note that the degree of over estimation varies across marriage durations and that after about 11 years in marriage the underestimation turns to overestimation.

Figure 3.

Relative risks of divorce for primary education relative to secondary education across the three models.

Figure 4.

Relative risks of divorce for post-secondary education relative to secondary education across the three models.

Relative risks of divorce for primary education relative to secondary education across the three models. Relative risks of divorce for post-secondary education relative to secondary education across the three models. Figures 5–7 show relative risks of divorce for those with primary and post-secondary education (relative to those with secondary education) in the Reduced Model, Anticipatory Model and Adjusted Model, respectively. Comparing Figures 6 and 7 we note that those with post-secondary education have higher risk of divorce than the baseline level for most of the marriage durations in the adjusted model (Figure 7) but that such super-risk is underestimated (in some intervals to the extent of being sub-risk) in the anticipatory model (Figure 6). Even those with primary-level education seem to have higher risks of divorce than the baseline over most of the marriage durations but, again, these are underestimated in the anticipatory model (Figure 6) compared to the adjusted model (Figure 7). Figure 5 shows that the educational gradient of divorce is much higher in the reduced model (where the anticipatory cases are discarded) and this is consistent with what we found in the preliminary analyses in Section 4.1.

Figure 5.

Relative risks of divorce for primary and post-secondary education relative to secondary education in the Reduced Model.

Figure 7.

Relative risks of divorce for primary and post-secondary education (relative to secondary education) in the Adjusted Model.

Figure 6.

Relative risks of divorce for primary and post-secondary education (relative to secondary education) in the Anticipatory Model.

Relative risks of divorce for primary and post-secondary education relative to secondary education in the Reduced Model. Relative risks of divorce for primary and post-secondary education (relative to secondary education) in the Anticipatory Model. Relative risks of divorce for primary and post-secondary education (relative to secondary education) in the Adjusted Model. Figure 8 presents the percentage under/over-estimation of relative risks of divorce in the anticipatory model (compared to the adjusted model). We note here that the relative risks of divorce for both educational levels (primary and post-secondary) are underestimated in the anticipatory model for marriage durations until 10 years whereafter they turn to overestimation.

Figure 8.

Percentage under-estimation (or over-estimation) of relative risks of divorce in the anticipatory model compared to the adjusted model.

Percentage under-estimation (or over-estimation) of relative risks of divorce in the anticipatory model compared to the adjusted model. In Table 3, we summarize our results and compare them with those from previous studies – Ghilagaber and Koskinen (G & K [9]) and Ghilagaber and Larsson (G & L [10]). Our present results shown in the last column are obtained by averaging the corresponding relative risks over the J = 15 time intervals partitioning the marriage duration.

Table 3.

Model	Education	G & K [9]	G & L [10]	Present work
Reduced	Primary	1.04 (0.74,1.51)	1.03 (0.72,1.49)	1.07 (0.45,2.48)
Model	Second (ref)	1	1	1
	Post-Sec	1.65 (1.38,1.85)	1.62 (1.34,1.81)	1.16 (0.78,1.78)
Anticip.	Primary	0.89 (0.65,1.23)	0.89 (0.64,1.20)	0.89 (0.42,1.91)
Model	Second (ref)	1	1	1
	Post-Sec	1.21 (1.14,1.26)	1.19 (1.12,1.22)	1.17 (1.03,1.26)
Adjusted	Primary	1.02 (0.77,1.41)	0.95 (0.69,1.28)	1.06 (0.47,2.43)
Model	Second (ref)	1	1	1
	Post-Sec	1.09 (0.98,1.18)	1.28 (1.17,1.33)	1.26 (0.59,3.59)
Estimated	Primary	14.80	16.40	-
years to	Secondary	6.5	3.65	3.56
complete	Post-Sec	6.3	1.90	2.43

Estimated educational gradients of divorce-risks (and 95% confidence intervals in brackets) in the reduced, anticipatory and adjusted models from three studies. Results in the third column refer to those of Ghilagaber and Koskinen (2009), results in the fourth column refer to those of Ghilagaber and Larsson (2019), while the results in the last column are our findings in the present study. We note that our results in the adjusted model (last column third panel) are closer to those of Ghilagaber and Koskinen [9] for those with primary level education. For those with post-secondary education, our results are more close to those of Ghilagaber and Larsson [10]. Note, however, that the three approaches are not comparable in all aspects and we will focus on our own results in the discussion below. While the three studies analyse the same data set, the two previous studies assume constant relative risks of divorce across marriage duration while the present study allows relative risks to vary over marriage duration. Further, the previous studies consider a Markovian process evolving forward in time (from birth) for times to complete the various educational levels. In the present study, we model educational levels using a reversed (backward) Markov chain and allow the transition probabilities between educational levels to depend on the time spent on education after marriage. As measures of uncertainty and ‘indirect’ indicators of goodness-of-fit, we also present, in Table 3, confidence/credible intervals and estimated number of years to complete the respective educational levels. Thus the expected number of years to complete primary level educational is 14.8 years (counted from birth) according to Ghilagaber and Koskinen [9] and 14.4 years according to Ghilagaber and Larsson [10]. We don't have a corresponding estimate in the current study because, as mentioned before, the formulation in the present study treats primary level education as absorbing state since all individuals who reported to have primary-level education at the survey time have completed it before they married (and, hence, there is no anticipatory case among those with primary level education).

Discussion

We addressed an important issue in inference with observational data where values of a covariate refer to what is achieved by the survey time but the covariate is used as regressor in models that relate it to behaviour that took place long before the survey. This causes problem because the value of the current-date (anticipatory) covariate does not follow the temporal order of events. We attempted to tackle the problem by proposing a dynamic Bayesian approach allowing to model the event of interest and the anticipatory covariate jointly, and the effect parameters to vary over time. We illustrated our problem by modelling effects of educational level attained at survey time on the risk of divorce among Swedish men. The results showed that ignoring the anticipatory nature of the education variable led to the underestimation of educational gradients on divorce for marriage durations until about 10 years. After 10 years, the effect becomes overestimation of the relative risks. This may be associated with differential risks in divorce between those who have completed their reported educational level short after marriage and those who did so long after marriage and closer to the survey date. The overall effect of ignoring the anticipatory nature of the education variable led to spurious significance of the relative risk of divorce for those with post-secondary education (relative to those with secondary education) and to much lower risk for those with primary education without affecting its significance. We also demonstrate that the relative risks are not constant over the marriage durations – something that is ignored while using standard methods like the proportional hazards models. There are some open questions that can be addressed in future investigations. For instance, it is not clear whether the anticipatory nature of our education variable (the fact that the unobserved educational level comes before the observed one) makes the problem different from classical misclassification problems. We believe that our present study does not suffer from differential measurement error because we were interested in the effect of educational level attained by the time of marriage. A possible suggestion for future investigation is, thus, to examine the prevalence, impact and adjustment of differential measurement error where the likelihood of having anticipatory education (completing it after divorce) is correlated with divorce risk. This can arise if, for instance, the experience of divorce increases or decreases the propensity to continue education. Click here for additional data file.

1 in total

1. A flexible approach to time-varying coefficients in the Cox regression setting.

Authors: D J Sargent
Journal: Lifetime Data Anal Date: 1997 Impact factor: 1.588

1 in total