| Literature DB >> 32939352 |
Samuel Asumadu Sarkodie1, Phebe Asantewaa Owusu1.
Abstract
The characteristics of panel data namely, inter alia, missing values, cross-sectional dependence, serial correlation, small time period bias, omitted variable bias, country-specific fixed-effects, time effects, heterogeneous effects and convergence often lead to misspecification, and spurious regression, thus, affecting the consistency and robustness of the model. In this regard, a more sophisticated panel estimation technique that accounts for the attributes and challenges is worthwhile. The novel panel bootstrap-corrected fixed-effects estimator (xtbcfe) and heterogeneous dynamics (panelhetero) recommended in this study meets almost all the requirements for robust and consistent panel estimation with an interface for user modifications. We further demonstrate how to use empirical CDF, moments and kernel density estimation to investigate heterogeneous effects. Due to the complexities in the application of xtbcfe and panelhetero algorithm, we provide a step-by-step procedure and guidelines for the estimation approach. We apply the xtbcfe and panelhetero algorithm for global estimation of mortality, disability-adjusted life years and welfare cost from exposure to ambient air pollution. Importantly, the xtbcfe algorithm can be applied to any panel data-based studies in social science, environmental science, environmental economics, health economics, energy economics, and among others.•Procedures useful for data imputation and transforming negative variables for time series, cross-sectional and panel data are presented.•Contrary to traditional models, we show how a novel approach can be modified and used to examine the degree of heterogeneous effects across cross-sectional units of panel data.•We demonstrate how the dynamic panel bootstrap-corrected fixed-effects estimator is useful in estimating higher-order panel data models and accounting for challenges such as omitted-variable bias, convergence, cross-section dependence and heterogeneous effects.•We apply the imputation technique, panelhetero, and xtbcfe algorithms to examine the nexus between ambient air pollution and health outcomes.Entities:
Keywords: Bias correction; Bootstrap-corrected fixed-effects estimator; Dynamic panel modeling; Heterogeneous dynamics; Missing data imputation; Monte Carlo simulation; Treatment of Negative values; Within estimator; panelhetero; xtbcfe
Year: 2020 PMID: 32939352 PMCID: PMC7479353 DOI: 10.1016/j.mex.2020.101045
Source DB: PubMed Journal: MethodsX ISSN: 2215-0161
Fig. 1Procedure for missing data imputation in Orange software.
Characteristics of data during pre-imputation and post-imputation.
| Data | Name | Center | Dispersion | Min. | Max. | Missing |
|---|---|---|---|---|---|---|
| Raw Data | PM 2.5 | 29.93 | 0.61 | 5.76 | 103.16 | 2608 (57%) |
| Imputed Data | PM 2.5 | 29.93 | 0.61 | 5.76 | 103.16 | 0(0%) |
| Raw Data | TOT_DALY | 9.26 | 0.51 | 1.72 | 39.16 | 0(0%) |
| Imputed Data | TOT_DALY | 9.26 | 0.51 | 1.72 | 39.16 | 0(0%) |
| Raw Data | TOT_MOR | 309.21 | 0.63 | 40.02 | 1257.4 | 0(0%) |
| Imputed Data | TOT_MOR | 309.21 | 0.63 | 40.02 | 1257.4 | 0(0%) |
| Raw Data | TOT_MOR_V | 13959 | 5.17 | 4.83 | 1029848 | 0(0%) |
| Imputed Data | TOT_MOR_V | 13959 | 5.17 | 4.83 | 1029848 | 0(0%) |
| Raw Data | TOT_SC_V | 17281.76 | 4.2 | 1.47 | 1580136 | 268(4%) |
| Imputed Data | TOT_SC_V | 17836.43 | 4.35 | 1.47 | 1580136 | 0(0%) |
Notes: Raw Data is the original dataset with missing values whereas Imputed Data denotes the output of the imputation algorithm. Legend: TOT_MOR is the total mortality from exposure to outdoor PM2.5 and ozone, TOT_MOR_V denotes premature deaths from exposure to outdoor PM2.5 and ozone, TOT_DALY is the total Disability-Adjusted Life Year from exposure to outdoor PM2.5 and ozone, TOT_SC_V is the total welfare cost of premature deaths from exposure to outdoor PM2.5 and ozone and PM 2.5 is the exposure to ambient particulate matter. The Table presented is reproduced from Owusu and Sarkodie [6].
First generational panel unit root tests.
| Variable | Breitung | IPS | ||
|---|---|---|---|---|
| Level | First Difference | Level | First Difference | |
| PM 2.5 | -39.132*** | -37.890*** | -36.528*** | -50.915*** |
| TOT_DALY | 21.974 | -25.140*** | 4.966 | -30.589*** |
| TOT_MOR | 21.255 | -27.366*** | 8.697 | -31.327*** |
| TOT_MOR_V | 32.947 | -28.873*** | 22.250 | -32.250*** |
| TOT_SC_V | 30.867 | -25.738*** | 27.921 | -31.840*** |
| TOT_SC_V_TOT_DALY | 22.479 | -27.153*** | 18.659 | -33.700*** |
Notes: *** denotes rejection of the null hypothesis of unit root at 1% significance level. The Table presented is reproduced from Owusu and Sarkodie [6].
Test for cross-sectional dependence.
| Variable | CD-test | p-value | average joint T | mean ρ | mean abs(ρ) |
|---|---|---|---|---|---|
| TOT_MOR | 5.751 | 0.000 | 28 | 0.010 | 0.650 |
| TOT_MOR_V | 231.128 | 0.000 | 28 | 0.32 | 0.74 |
| PM 2.5 | 2.425 | 0.015 | 28 | 0.000 | 0.220 |
| TOT_DALY | 32.001 | 0.000 | 28 | 0.040 | 0.650 |
| TOT_SC_V | 405.366 | 0.000 | 28 | 0.560 | 0.670 |
| TOT_SC_V_TOT_DALY | 184.500 | 0.000 | 28 | 0.250 | 0.590 |
Notes: Under the null hypothesis of cross-section independence, CD ~ N(0,1) P-values close to zero indicate data are correlated across panel groups. The Table presented is reproduced from Owusu and Sarkodie [6].
Moments estimation for Mortality rates (lnTOT_MOR).
| Parameters | Estimate | S.E | Low* | High** |
|---|---|---|---|---|
| Mean of Mean | 5.539 | 0.042 | 5.470 | 5.621 |
| Mean of Autocovariance | 0.040 | 0.004 | 0.031 | 0.046 |
| Mean of Autocorrelation | 1.036 | 0.014 | 1.010 | 1.062 |
| Variance of Mean | 0.373 | 0.031 | 0.319 | 0.435 |
| Variance of Autocovariance | 0.002 | 0.000 | 0.000 | 0.002 |
| Variance of Autocorrelation | -0.011 | 0.005 | -0.020 | -0.001 |
| Correlation between Mean and Autocovariance | -0.278 | 0.095 | -0.469 | -0.088 |
| Correlation between Mean and Autocorrelation | -0.361 | 0.131 | -0.599 | -0.123 |
| Correlation between Autocovariance and Autocorrelation | 0.447 | 0.062 | 0.336 | 0.573 |
Notes: *,** denotes 95 % Confidence Intervals for Moments based on bootstrapping across cross-sectional units; S.E represents Standard Errors of the estimates based on bootstrapping across cross-sectional units.
Fig. 2Empirical CDF Estimation for Mean (a) Mortality (b) Ambient Air Pollution.
Fig. 3Empirical CDF Estimation for Variance (a) Mortality (b) Ambient Air Pollution.
Fig. 4Empirical CDF Estimation for Autocorrelation of order 1 (a) Mortality (b) Ambient Air Pollution.
Fig. 5Kernel Density Estimation for Mean (a) Mortality (b) Ambient Air Pollution.
Fig. 7Kernel Density Estimation for Autocorrelation of order 1 (a) Mortality (b) Ambient Air Pollution.
Second generational panel unit root tests.
| Variable | CIPSa | PESCADFb | ||
|---|---|---|---|---|
| Level | First Difference | Level | First Difference | |
| PM 2.5 | -4.533*** | -6.071*** | -15.390*** | -34.090*** |
| TOT_DALY | -1.551 | -3.366*** | 1.872 | -11.245*** |
| TOT_MOR | -1.387 | -3.420*** | 4.752 | -13.206*** |
| TOT_MOR_V | -1.508 | 3.243*** | 2.228 | -2.741*** |
| TOT_SC_V | -1.112 | -3.915*** | 27.921 | -4.504*** |
| TOT_SC_V_TOT_DALY | -1.861 | -4.236*** | 1.336 | -7.357*** |
Notes:aH0 (homogeneous non-stationary): bi = 0 for all I whereas b the null hypothesis assumes all series are non-stationary in a heterogeneous panel with cross-sectional dependence. The Table presented is reproduced from Owusu and Sarkodie [6].
Parameter estimates of ambient air pollution and health outcomes using dynamic panel bootstrap-corrected fixed-effects.
| Estimation | Mortality | Premature | DALYs | Welfare Cost | Mortality | Mortality |
|---|---|---|---|---|---|---|
| | | 0.310*** | 0.274*** | 0.315*** | 0.211** | 0.841*** | 0.831*** |
| DALYs | — | — | — | — | 0.147*** | 0.108*** |
| Welfare Cost | — | — | — | — | 0.004*** | -0.012*** |
| Welfare Cost × DALYs | — | — | — | — | — | 0.008*** |
| PM2.5 | 0.001*** | 0.001*** | 0.001*** | 0.048*** | 0.005*** | 0.005*** |
| 1992 | — | — | — | 0.073 | 0.004** | 0.004** |
| 1993 | 0.000 | 0.000 | 0.000 | -0.012 | 0.007*** | 0.007*** |
| 1994 | -0.008*** | -0.009*** | -0.007*** | 0.003 | 0.000 | 0.001 |
| 1995 | -0.010*** | -0.011*** | -0.010*** | 0.078 | -0.002 | -0.001 |
| 1996 | -0.001 | -0.002 | -0.000 | 0.169** | 0.005** | 0.006*** |
| 1997 | -0.006*** | -0.006*** | -0.003 | 0.066 | 0.005* | 0.006** |
| 1998 | -0.008*** | -0.008*** | -0.006** | 0.171** | 0.003 | 0.004* |
| 1999 | -0.012*** | -0.013*** | -0.010*** | 0.154* | 0.000 | 0.001 |
| 2000 | -0.017*** | -0.018*** | -0.015*** | 0.148** | -0.005* | -0.004 |
| 2001 | -0.005** | -0.006*** | -0.003 | 0.279*** | 0.004 | 0.005* |
| 2002 | -0.005** | -0.006*** | -0.005** | 0.225** | 0.008*** | 0.009*** |
| 2003 | -0.008*** | -0.009*** | -0.007*** | 0.211*** | 0.007** | 0.008*** |
| 2004 | -0.016*** | -0.016*** | -0.013*** | 0.303*** | 0.001 | 0.002 |
| 2005 | -0.011*** | -0.012*** | -0.009*** | 0.469*** | 0.002 | 0.004 |
| 2006 | -0.004* | -0.005** | -0.000 | 0.445*** | 0.009*** | 0.011*** |
| 2007 | -0.006*** | -0.007*** | -0.003 | 0.448*** | 0.010*** | 0.012*** |
| 2008 | -0.009*** | -0.010*** | -0.007*** | 0.510*** | 0.008** | 0.009*** |
| 2009 | -0.010*** | -0.011*** | -0.008*** | 0.422*** | 0.007** | 0.009** |
| 2010 | -0.012*** | -0.013*** | -0.011*** | 0.534*** | 0.006* | 0.008** |
| 2011 | 0.005* | 0.003 | -0.006** | 0.533*** | 0.021*** | 0.022*** |
| 2012 | -0.026*** | -0.027*** | -0.025*** | 0.521*** | 0.000 | 0.002 |
| 2013 | -0.015*** | -0.017*** | -0.014*** | 0.568*** | 0.003 | 0.005 |
| 2014 | -0.018*** | -0.021*** | -0.017*** | 0.650*** | 0.002 | 0.004*** |
| 2015 | 0.031*** | 0.028*** | 0.031*** | 0.619*** | 0.044*** | 0.046 |
| 2016 | -0.041*** | -0.042*** | -0.041*** | 0.747*** | -0.003 | 0.000*** |
| 2017 | 0.012*** | 0.008*** | 0.012*** | 0.652*** | 0.028*** | 0.030 |
| Convergence | YES | YES | YES | YES | YES | YES |
| MWALD test ( | 0.000*** | 0.000*** | 0.000*** | 0.000*** | 0.000*** | 0.000*** |
Notes: [.] denotes Bootstrapped standard errors, Bootstrap 95% (percentile-based) confidence intervals and Inference performed with non-parametric bootstrap; γ represents the lagged dependent variable; ***,**,* denote statistical significance at 1, 5, and 10% levels.
a,b,c denote Mortality ~ f(Ambient air pollution), Mortality ~ f(DALYs, welfare cost and ambient air pollution) and Mortality ~ f(DALYs, welfare cost, ambient air pollution and interaction between DALYs and welfare cost). ‡denotes the modified Wald test used as a post-estimation technique to examine groupwise heteroskedasticity under the null hypothesis, H0: σ(i)2 = σ2 for all i. Legend: DALY is the average total Disability-Adjusted Life Years from exposure to PM2.5 and ozone, MWALD means the modified Wald statistics, and Prob>chi is the probability of Chi-squared test. The Table presented is reproduced from Owusu and Sarkodie [6].
Fig. 8Post-estimation bootstrap-stimulated distribution of autoregressive (AR) coefficients and their sum for the: (a) relationship between mortality and PM2.5 (b) relationship between premature deaths and PM2.5 (c) relationship between DALYs and PM2.5 (d) relationship between welfare cost of premature deaths from exposure to PM2.5 and ozone and PM2.5 (e) relationship between mortality versus PM2.5, DALYs, and the welfare cost of premature deaths from exposure to PM2.5 and ozone (f) relationship between mortality versus PM2.5, DALYs, the welfare cost of premature deaths from exposure to PM2.5 and ozone, and the interactive effective of DALYs and welfare cost of premature deaths from exposure to PM2.5 and ozone. Figure presented is reproduced from Owusu and Sarkodie [6].
| Subject Area: | Environmental Science, Health Economics, Econometrics |
| More specific subject area: | |
| Method name: | |
| Name and reference of original method: | Everaert, Gerdie, and Lorenzo Pozzi. "Bootstrap-based bias correction for dynamic panels." |
| Resource availability: | Data used in this study were extracted from the environmental risk and health database of the Organization for Economic Co-operation Development |