| Literature DB >> 33265353 |
Hea-Jung Kim1, Mihyang Bae1, Daehwa Jin1.
Abstract
In a regression analysis, a sample-selection bias arises when a dependent variable is partially observed as a result of the sample selection. This study introduces a Maximum Entropy (MaxEnt) process regression model that assumes a MaxEnt prior distribution for its nonparametric regression function and finds that the MaxEnt process regression model includes the well-known Gaussian process regression (GPR) model as a special case. Then, this special MaxEnt process regression model, i.e., the GPR model, is generalized to obtain a robust sample-selection Gaussian process regression (RSGPR) model that deals with non-normal data in the sample selection. Various properties of the RSGPR model are established, including the stochastic representation, distributional hierarchy, and magnitude of the sample-selection bias. These properties are used in the paper to develop a hierarchical Bayesian methodology to estimate the model. This involves a simple and computationally feasible Markov chain Monte Carlo algorithm that avoids analytical or numerical derivatives of the log-likelihood function of the model. The performance of the RSGPR model in terms of the sample-selection bias correction, robustness to non-normality, and prediction, is demonstrated through results in simulations that attest to its good finite-sample performance.Entities:
Keywords: Gaussian process model; Markov chain Monte Carlo; bias correction; hierarchical Bayesian methodology; robust sample-selection MaxEnt process regression model; sample-selection bias
Year: 2018 PMID: 33265353 PMCID: PMC7512777 DOI: 10.3390/e20040262
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Graphs of the sample-selection bias and the difference in marginal effect of the k-th predictor.
Figure 2Graphs of estimated regression functions (left panel) and predicted regression functions (right panel): (i) black lines are used for the true regression function; (ii) red dashed lines for the robust sample-selection Gaussian process regression (RSGPR) models; (iii) blue dashed lines for the Gaussian process regression (GPR) models.
Figure 3Graphs of regression functions: estimated regression functions (left panel) and predicted regression functions (right panel).
Posterior Summary.
| True Value | Mean | s.d. | SGPR | MC Error | Mean | s.d. | GPR Model | MC Error | ||
|---|---|---|---|---|---|---|---|---|---|---|
| RMSE | MAB | RMSE | MAB | |||||||
| 2.831 | 0.308 | 0.351 | 0.426 | 0.018 | 2.094 | 0.104 | 0.912 | 0.800 | 0.002 | |
| 0.380 | 0.376 | 0.563 | 0.287 | 0.064 | NA | NA | NA | NA | NA | |
| 2.880 | 0.974 | 0.509 | 0.515 | 0.050 | 2.130 | 0.109 | 0.876 | 0.800 | 0.003 | |
| 0.435 | 0.275 | 0.627 | 0.422 | 0.032 | NA | NA | NA | NA | NA | |
s.d.: standard deviation; SGPR: sample-selection Gaussian process normal error regression; RMSE: root mean square error; MAB: mean absolute bias; MC: Monte Carlo; GPR: Gaussian process regression.