Literature DB >> 36212550

Informative prior on structural equation modelling with non-homogenous error structure.

Oladapo A Olalude1, Bernard O Muse2, Oluwayemisi O Alaba1.   

Abstract

Introduction: This study investigates the impact of informative prior on Bayesian structural equation model (BSEM) with heteroscedastic error structure. A major drawback of homogeneous error structure is that, in most studies the underlying assumption of equal variance across observation is often unrealistic, hence the need to consider the non-homogenous error structure.
Methods: Updating appropriate informative prior, four different forms of heteroscedastic error structures were considered at sample sizes 50, 100, 200 and 500.
Results: The results show that both posterior predictive probability (PPP) and log likelihood are influenced by the sample size and the prior information, hence the model with the linear form of error structure is the best. Conclusions: The study has been able to address sufficiently the problem of heteroscedasticity of known form using four different heteroscedastic conditions, the linear form outperformed other forms of heteroscedastic error structure thus can accommodate any form of data that violates the homogenous variance assumption by updating appropriate informative prior. Thus, this approach provides an alternative approach to the existing classical method which depends solely on the sample information. Copyright:
© 2022 Olalude OA et al.

Entities:  

Keywords:  Bayesian SEM; Heteroscedastic error structure; Latent Variable; Observed Variable; Predictive Performance

Mesh:

Year:  2022        PMID: 36212550      PMCID: PMC9515605          DOI: 10.12688/f1000research.108886.2

Source DB:  PubMed          Journal:  F1000Res        ISSN: 2046-1402


Introduction

Bayesian structural equation modeling (BSEM) analyses the relationship between the observed, unobserved, and latent variables within the Bayesian context. , , , The data visualization can be done by path diagram. In spite the rising number of statistical research ideas that have been created and verified using structural equation modelling (SEM). Despite its propensity to skew statistical estimates and inference and unlike the classical regression, we suggest the use of diagnostic tests for the presence of multicollinearity, heteroscedasticity, and nonnormality. Bayesian structural equation model investigations rarely mention the use of statistical approaches for measurement and structural model assessment, non-normality, multicollinearity, heteroscedasticity, and combinations thereof. In Bayesian inference, is random, which depicts the level of uncertainty about the true value of because both the observed data and the parameters are assumed random. The joint probability of the parameters and the data as functions of the conditional distribution of the data given the parameters, and the prior distribution of the parameters can be modelled. More formally, where P( θ|y) is the posterior distribution P( θ) is the prior distribution P( y|θ) is the likelihood function The un-normalized posterior distribution when expressed in terms of the unknown parameters θ for fixed values of , this term is the likelihood L( θ| y). Thus, can be rewritten as: Studies abound on classical methods and Bayesian methods with a focus on homogeneous variance. , , , This study explores the BSEM using different forms of heteroscedastic error structure.

Methods

Bayesian estimation of structural equation models (SEM)

This section develops a Gibbs sampler to estimate SEM with reflective measurement indicators. , , The Bayesian estimation is illustrated by considering a SEM that is equivalent to the mostly used model. A SEM is composed of a measurement equation (3) and a structural equation (4) : where It is assumed that measurement errors are uncorrelated with and , residuals are uncorrelated with and the variables are distributed as follows: , where and are diagonal matrices. The covariance matrix of is derived based on the SEM:

Prior distributions

In order to enable Gibbs sampling from full conditional posterior distributions, natural conjugate prior distributions for the unknown parameters are considered. Let be the kth diagonal element of , be the th diagonal element of be the kth row of and be the lth row of M, with and

Derivations of conditional distributions

The joint posterior of all unknown parameters is proportional to the likelihood times the prior, Given Y and , and are independent from . Draws of , can cause estimation of and as a simple regression model. Thus, sampling from the posterior distribution of and without reference to The same holds for inference with regard to M, and , which are independent from Y given .

Heteroscedastic error structures

The heteroscedastic error structure with different functional form of error variance under consideration are double logarithmic form, linear form, linear-inverse form and linear-absolute form as expressed in equation 17, 18, 19 and 20, respectively. Each of the functional forms of heteroscedastic error structure will be incorporated into the modified model. The variance matrix for disturbance vector is given as

The posterior distribution

The posterior density is the product of the likelihood and the prior distribution chosen , Since the full posterior distribution is intractable; a Markov chain Monte Carlo (MCMC) simulation method of Gibbs sampling is employed. This involves the use of marginal posterior distribution. Also Consider an informative prior created by set. And letting c The posterior distribution of conditional on , h, is given by: Solving the exponential part of the above equation, we will have: Therefore, The additional term not involving is factored out to give: Factorization in terms of , the term in the exponential becomes: So, the posterior density of conditioned on other parameter h, , y ∗ is a multivariate normal with mean and variance . That is, The posterior distribution of h conditional on , , is given by: The posterior distribution of Ω *, conditional on y *, λ *, h, is given by:

The Gibbs sampler

The Gibbs sampling procedure used in this study involves generation of sequence of draws from the conditional posterior distribution of each parameter. , ,

Gibbs sampling procedure

Chose a starting or initial value, for Take a random draw, from the full conditional, Take a random draw, from the full conditional, using the updated values of Repeat until M draws are obtained, each being a vector of Perform the Burn-in by dropping the first of these draws to eliminate the effect of , the remaining draws are then averaged to obtain the estimate of the posterior . The right-hand side of ( 15) is proportional to the density function of an inverse Wishart distribution. Then,

Design of simulation

At different functional forms of heteroscedastic error structure with changes in sample size of 50, 100, 200 and 500. Hyper-parameter will be arbitrarily chosen for the simulation using Gibbs sampler an MCMC method. , The R code can be accessed via the Extended data. Factor loading and error precision followed multivariate normal and inverse gamma distributions respectively to assess the prior sensitivity. The criteria that will be used to assess the performance of the posterior simulation technique are the posterior estimates. In order to evaluate the Bayesian model fit, we used the posterior predictive probability (PPP) procedure. , , , After achieving convergence (after j iterations). can be regarded as observation from p(λ*, Ω| y) collect for statistical inference. gives Bayesian estimates of parameter and the latent variables. , ,

Results and discussion

The section presents the discussion of analysis of results; performances of the estimators across the parameters for the different forms of heteroscedasticity, performances of Bayesian posterior simulation and analytical methods in the presence of heteroscedasticity via consideration of four (4) different forms of heteroscedastic error structures over four sample sizes of 50, 100, 200 and 500.

Performance of the estimators at heteroscedasticity condition

This gives the results for the latent and observed variables at various sample sizes for the four heteroscedastic error conditions considered.

Comparison of latent variable estimates at different sample sizes under the heteroscedasticity condition

Using the assumed values for the estimates which are = 2.0, = 3.0 and precision = 15.0. The covariance matrix of ω was derived to be with M at fixed values (0 or 1). The Bayesian estimates of SEM using the independent normal-gamma priors were derived for the two classes of SEM. Hyper-parameter was arbitrarily chosen for the simulation using Gibbs sampler a Markov chain Monte Carlo (MCMC) method since the joint posterior density does not have a tractable form. For the double logarithmic form, at 95% credible interval, when n=50, Posterior Mean, PM, and Precision, PR (2.011, 2.435, and 13.202), Posterior Standard Deviation PSD (0.035, 0.033, and 0.223) and when n=100, PM, and PR (2.022, 2.528, and 13.70), PSD (0.023, 0.025, and 0.251), when n=200, PM, and PR (2.052, 2.611, and 14.4), PSD (0.017, 0.018, and 0.255), when n=500, PM, and PR (2.010, 2.801, and 14.7), PSD (0.031, 0.021, and 0.258). For the linear form, when n=50, PM, and PR (1.845, 2.779, and 13.95), PSD (0.240, 0.242, and 0.235). When n=100, PM, and PR (1.861, 2.811, and 14.22), PSD (0.328, 0.226, and 0.325), when n= 200, PM, and PR (1.956, 2.921, and 14.72), PSD (0.219, 0.217, and 0.212), and when n=500, PM, and PR (2.120, 3.122, and 14.95), PSD (0.211, 0.311, and 0.114). For the linear-inverse form when n=50, PM, and PR (1.882, 2.742, and 14.95), PSD (0.040, 0.028, and 0.291). When n=100, PM, and PR (1.972, 2.835, and 14.65), PSD (0.024, 0.023, and 0.229). When n=200, PM, and PR (1.988, 2.901, and 14.45), PSD (0.017, 0.016, and 0.109), and when n=500, PM, and PR (2.021, 3.003, and 14.21), PSD (0.011, 0.015, and 0.105). For the linear-absolute form, when n=50, PM, and PR (2.036, 2.824, and 14.500), PSD (0.032, 0.034, and 0.122), When n=100, PM, and PR (1.908, 2.903, and 13.92), PSD (0.022, 0.026, and 0.234). When n=200, PM, and PR (1.893, 2.809, and 13.85), PSD (0.017, 0.023, and 0.311), and when n=500, PM, and PR (1.806, 2.788, and 13.55), PSD (0.031, 0.035, and 0.433). Examining different forms of heteroscedastic error structures in Bayesian structural equation modeling using informative priors, rather than assuming homogenous variance which is often a statistical fallacy in many studies. We compare the models’ posterior means and standard deviations in Tables 1, 2, 3 and 4. The differences are unlikely to impact substantive conclusions, but two of them are noteworthy.
Table 1.

Double logarithmic form on latent variable and observed variable estimates.

Sample sizesLatent variablesPosterior Mean (PM)Posterior Standard Deviation (PSD)Credible Interval (CI)Measured variablesEstimateStandard Deviation
n=50 λ1 2.0110.0351.9592.062 x 1 0.0450.023
λ2 2.4350.0332.3842.485 x 2 0.0380.023
Precision (PR)13.2020.22313.07113.332
n=100 λ1 2.0220.0231.9792.064 x 1 0.0530.008
λ2 2.5280.0252.4842.571 x 2 0.0370.024
Precision13.7000.25113.56113.838
N=200 λ1 2.0520.0172.0152.088 x 1 0.0060.045
λ2 2.6110.0182.5732.648 x 2 0.0480.020
Precision14.40.25514.26014.539
N=500 λ1 2.0100.0311.9612.058 x 1 0.0400.028
λ2 2.8010.0212.7602.841 x 2 0.0180.004
Precision14.70.25814.55914.840
Table 2.

Linear form on latent variable and observed variable estimates.

Sample sizesLatent variablesPosterior Mean (PM)Posterior Standard Deviation (PSD)Credible Interval (CI)Measured variablesEstimateStandard Deviation
n=50 λ1 1.8450.2401.7091.981 x 1 0.0780.017
λ2 2.7790.2422.6432.915 x 2 0.0550.036
Precision13.9500.23513.81614.844
n=100 λ1 1.8610.3281.7022.0197 x 1 0.0790.012
λ2 2.8110.2262.6792.943 x 2 0.0360.028
Precision14.2200.32514.06214.378
N=200 λ1 1.9560.2191.8262.086 x 1 0.0710.008
λ2 2.9210.2172.7923.050 x 2 0.0470.016
Precision14.720.21214.54214.898
N=500 λ1 2.1200.2111.9932.247 x 1 0.0520.022
λ2 3.1220.3112.9673.277 x 2 0.0590.010
Precision14.950.11414.85715.044
Table 3.

Linear inverse form on latent variable and observed variable estimates.

Sample sizesLatent variablesPosterior Mean (PM)Posterior Standard Deviation (PSD)Credible Interval (CI)Measured variablesEstimateStandard Deviation
n=50 λ1 1.8820.0431.8271.937 x 1 0.0750.020
λ2 2.7420.0282.6962.788 x 2 0.0230.017
Precision14.950.29114.80115.099
n=100 λ1 1.9720.0241.9292.015 x 1 0.0550.010
λ2 2.8350.0232.7932.877 x 2 0.0310.021
Precision14.650.22914.31714.583
N=200 λ1 1.9880.0171.8262.102 x 1 0.0540.006
λ2 2.9010.0162.7903.012 x 2 0.0320.024
Precision14.450.10914.35814.541
N=500 λ1 2.0210.0111.9922.050 x 1 0.0520.015
λ2 3.0030.0152.9693.037 x 2 0.0500.022
Precision14.2100.10514.12014.300
Table 4.

Linear absolute form on latent variable and observed variable estimates.

Sample sizesLatent variablesPosterior Mean (PM)Posterior Standard Deviation (PSD)Credible Interval (CI)Measured variablesEstimateStandard Deviation
n=50 λ1 2.0360.0321.9862.086 x 1 0.0430.018
λ2 2.8240.0342.7732.875 x 2 0.0270.022
Precision14.5000.12214.40314.597
n=100 λ1 1.9080.0221.8671.949 x 1 0.0470.017
λ2 2.9030.0262.8582.948 x 2 0.0430.025
Precision13.920.23413.78614.054
N=200 λ1 1.8930.0171.8571.929 x 1 0.0540.017
λ2 2.8090.0232.7672.851 x 2 0.0410.024
Precision13.850.31113.69614.005
N=500 λ1 1.8060.0311.7571.855 x 1 0.0480.019
λ2 2.7880.0352.7362.840 x 2 0.0440.022
Precision13.550.43313.36713.732
First, the posterior means of the loadings ( and ) are somewhat smaller under different heteroscedastic condition with the informative priors as observed in Tables 6 and 7. Second, the factor variance is larger under our model with informative priors, likely because the informative prior placed more density on larger values of the posterior standard deviation. An evaluation of the model fit was based on the values of PPP as shown in Table 5 and it was observed that the linear form is the best with minimum PPP value as sample size increases. It was also revealed by the downward slope of the model as the sample size increases from 50 to 500 shown in Figure 1b when compared with Figure 1a, 2a and 2b.
Table 6.

Latent variable estimates at different sample sizes under the double-logarithmic and linear forms.

Sample sizeLatent variablesDouble logarithmicLinear
Posterior Mean (PM)Posterior Standard Deviation (PSD)Credible Interval (CI)Posterior Mean (PM)Posterior Standard Deviation (PSD)Credible Interval (CI)
N=50 λ1 2.0010.2311.8682.1342.1100.2301.9772.243
λ2 2.2830.5382.0802.4862.5540.2012.4302.678
N=100 λ1 2.0210.3121.8662.1762.0200.1231.9232.117
λ2 2.4780.5622.2702.6862.6010.3562.4362.766
N=200 λ1 2.0320.4321.8502.2142.0110.1741.8952.127
λ2 2.7700.8322.5173.0232.7050.4562.5182.892
N=500 λ1 2.1000.4451.9152.2852.0050.2531.8662.144
λ2 2.8881.5642.5413.2343.1020.5752.8923.312

Note: Posterior mean (PM), posterior standard deviation (PSD), credible interval (CI).

Table 7.

Latent variable estimates at different sample sizes under the linear-inverse and linear absolute forms.

Sample sizeLatent variablesLinear-inverseLinear-absolute
Posterior Mean (PM)Posterior Standard Deviation (PSD)Credible Interval (CI)Posterior Mean (PM)Posterior Standard Deviation (PSD)Credible Interval (CI)
N=50 λ1 2.1010.3521.9372.2651.7320.3111.5771.887
λ2 2.6370.5282.4362.8382.5820.5832.3702.794
N=100 λ1 1.9820.4211.8022.1621.8100.2521.6711.949
λ2 2.7540.1922.6332.8752.6340.3752.4642.804
N=200 λ1 1.9750.4761.7842.1661.8200.2111.6961.947
λ2 2.8140.9012.5513.0772.7230.7662.4802.966
N=500 λ1 2.1110.4881.9172.3051.9200.1451.8152.026
λ2 3.0731.1022.7823.3642.9020.3312.7433.062
Table 5.

Comparison at varying sample sizes of different heteroscedastic form.

Sample sizeDouble logarithmicLinearLinear inverseLinear absolute
LogLikPPPLogLikPPPLogLikPPPLogLikPPP
N=50-17.5770.538-17.3090.501-19.7010.567-20.0650.560
N=100-24.3240.543-43.0580.523-16.2140.544-19.7770.544
N=200-29.4270.541-44.9350.545-15.3050.540-19.5470.532
N=500-35.5100.482-60.9200.570-14.4940.531-18.1710.506
Figure 1.

Plot of log likelihood and posterior predictive probability (PPP) at various sample sizes under (a) the double logarithmic form and (b) the linear form.

Figure 2.

Plot of log likelihood and posterior predictive distribution (PPP) at various sample sizes under (a) the linear-inverse form (b) the linear-absolute form.

Note: Posterior mean (PM), posterior standard deviation (PSD), credible interval (CI). Considering an improvement to maximum likelihood method, in Bayesian estimations, parameters are considered as random with informative prior distribution also known as the conjugate family of the posterior, once the data is simulated/collected, it is combined with prior distribution using Bayes theorem, next posterior distribution is calculated reflecting the prior knowledge and simulated data. , , Joint posterior distribution is summarized using MCMC simulation techniques in terms of lower dimensional summary statistics as posterior mean and posterior standard deviations. , We observe that the structural and measurement equation obtained from this study are adequate and in general we could accept the proposed model.

Conclusion

In this research, the derived Bayesian estimators of a structural equation model in the presence of different forms of heteroscedastic error structures validated accurate statistical inference. The study has also been able to address sufficiently the problem of heteroscedasticity of known form using four different heteroscedastic conditions for both linear and quadratic forms, and it has also successfully modified the homogenous error structure to heteroscedastic error structure in Bayesian structural equation model. The linear form outperformed other forms of heteroscedastic error structure thus can accommodate any form of data that violates the homogenous variance assumption by updating appropriate informative prior. However, these heteroscedastic error structure models can also be tested as an area of further research by updating appropriate noninformative prior. , Thus, this approach provides an alternative approach to the existing classical method which depends solely on the sample information.

Data availability

Underlying data

All data underlying the results are available as part of the article and no additional source data are required.

Extended data

Figshare: RCODE BSEM.docx. https://doi.org/10.6084/m9.figshare.19299851. Data are available under the terms of the Creative Commons Zero “No rights reserved” data waiver (CC0 1.0 Public domain dedication). I am happy with the corrections in the revised paper. It was improved. So, the current version of this manuscript is suitable for indexing. Good luck. Is the work clearly and accurately presented and does it cite the current literature? Partly If applicable, is the statistical analysis and its interpretation appropriate? Yes Are all the source data underlying the results available to ensure full reproducibility? Yes Is the study design appropriate and is the work technically sound? Yes Are the conclusions drawn adequately supported by the results? Yes Are sufficient details of methods and analysis provided to allow replication by others? Yes Reviewer Expertise: Econometrics, R Programming, Panel Data, Time Series, Computational Statistics,  Data Analysis, R Statistical Packages, Statistical Modeling,  Nonparametric Models, Robust Regression. I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. This paper investigated the impact of informative prior on Bayesian structural equation model with heteroscedastic error structure. Four different forms of heteroscedastic error structures were considered. The suggested Bayesian approach provides an alternative approach to the existing classical method which depends solely on the sample information. The results indicate that the suggested Bayesian estimation method is more efficient than the existing classical method. In my opinion, the paper offers a good contribution. So, I recommend accepting this paper, but after making the following modifications to improve the manuscript: I think the title of the paper needs improvement. I suggest the following title: “Bayesian estimation of structural equation modelling with non-homogenous error structure”. In the “abstract” section, the findings or research results should be introduced briefly in the abstract. In the “introduction” section, the introduction did not contain enough background information. Also discuss the similar work that has been done in this area to give a detailed view of this work. The authors should add more papers related to the Bayesian estimation of structural equation modelling. In the “methods” section, the authors should define each symbol given in each equation. In the “conclusion” section, the limitation and future research directions should be mentioned. Is the work clearly and accurately presented and does it cite the current literature? Partly If applicable, is the statistical analysis and its interpretation appropriate? Yes Are all the source data underlying the results available to ensure full reproducibility? Yes Is the study design appropriate and is the work technically sound? Yes Are the conclusions drawn adequately supported by the results? Yes Are sufficient details of methods and analysis provided to allow replication by others? Yes Reviewer Expertise: Applied statistics, Econometric models. I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. ABSTRACT General Comment under this section: the researcher can add their findings In this paper, the research team investigates the impact of informative prior on Bayesian Structural equation model (BSEM) with heteroscedastic error structure. The drawback of homogeneous error structure was addressed by considering the non-homogenous error structure. INTRODUCTION General Comment under this section: The Introductory part was well presented and detailed with relevant citation, even though the researchers can still explore more. Bayesian structural equation modelling (BSEM) analyses the relationship between the observed, unobserved and latent variables within the Bayesian context The likelihood is the un-normalized posterior distribution when expressed in terms of the unknown parameters θ for fixed values of y. The study explores the BSEM using different forms of heteroscedastic error structure. Other studies abound on classical methods and Bayesian methods with focus on homogeneous variance. 8,19,22,25 Methods Gibbs sampler was developed to estimate SEM with reflective measurement indictors. 1,11,12 The SEM equation used is composed of a measurement equation and a structural equation. 9 To enable Gibbs sampling from full conditioner posterior distributions, natural conjugate prior distributions for the unknown parameters were considered. The heteroscedastic error structure with different functional form of error variance under consideration are double logarithm form, linear form, linear-inverse form and linear-absolute form as expresses in equation 17,18,19 and 20. Markov Chain Monte Carlo (MCMC) Simulation method of Gibbs Sampling was employed. SIMULATION General Comment under this section: The methods were well presented and the simulation study well organized. At different functional forms of 3 heteroscedastic error structure with changes in sample size of 50,100,200 and 500. Hyper-parameter was arbitrarily chosen for the simulation using Gibbs sampler an MCMC method. To assess the prior sensitivity, factor loading and error precision followed multivariate normal and inverse gamma distributions respectively. The posterior estimate is used to assess the performance of the posterior simulation technique. In order to evaluate the Bayesian Model fit, the researcher used the posterior predictive probability (PPP) procedure. 4,5,7,24 RESULTS General comment under this section: The obtained results in this section indicate the correct performances with increased sample sizes and the incorporation of informative priors. This gives the results for the latent and observed variables at various sample sizes for the four heteroscedastic error conditions considered using the assumed values for the estimates which are λ 1 = 2.0, λ 2 = 3.0 and precision 15.0. The posterior means of loadings λ 1 and λ 2 are somewhat smaller under different heteroscedastic condition with the informative priors. It was observed that the linear form is the best with minimum PPP value as sample size increases. It was also revealed by the downward slope of the model as the sample size increases from 50 500. It was observed that the structural and measurement equation obtained from this study are adequate and in general could be accepted for the proposed model. CONCLUSION General comment: this section flows with the contents of the paper and is well presented. The study has been able to address sufficiently the problem of heteroscedasticity of known form using four different heteroscedasticity of known form using four different heteroscedastic conditions for both linear and quadratic forms. It has also successfully modified the homogenous error structure to heteroscedastic error structure in Bayesian structural equation model. Thus, the approach provides an alternative approach to the existing classical method which depends solely on sample information. The manuscript is well written and followed the format of the Journal and has substance; the manuscript can be approved. Is the work clearly and accurately presented and does it cite the current literature? Yes If applicable, is the statistical analysis and its interpretation appropriate? Yes Are all the source data underlying the results available to ensure full reproducibility? Yes Is the study design appropriate and is the work technically sound? Yes Are the conclusions drawn adequately supported by the results? Yes Are sufficient details of methods and analysis provided to allow replication by others? Yes Reviewer Expertise: Environmental Statistics and Econometric I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
  5 in total

1.  Maximum likelihood analysis of a general latent variable model with hierarchically mixed data.

Authors:  Sik-Yum Lee; Xin-Yuan Song
Journal:  Biometrics       Date:  2004-09       Impact factor: 2.571

2.  DYNAMIC PROGRAMMING AND LAGRANGE MULTIPLIERS.

Authors:  R Bellman
Journal:  Proc Natl Acad Sci U S A       Date:  1956-10       Impact factor: 11.205

3.  Adapting fit indices for Bayesian structural equation modeling: Comparison to maximum likelihood.

Authors:  Mauricio Garnier-Villarreal; Terrence D Jorgensen
Journal:  Psychol Methods       Date:  2019-06-10

4.  Comparative fit indexes in structural models.

Authors:  P M Bentler
Journal:  Psychol Bull       Date:  1990-03       Impact factor: 17.737

5.  Prior sensitivity analysis in default Bayesian structural equation modeling.

Authors:  Sara van Erp; Joris Mulder; Daniel L Oberski
Journal:  Psychol Methods       Date:  2017-11-27
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.