Literature DB >> 33265353

On a Robust MaxEnt Process Regression Model with Sample-Selection.

Hea-Jung Kim¹, Mihyang Bae¹, Daehwa Jin¹.

Abstract

In a regression analysis, a sample-selection bias arises when a dependent variable is partially observed as a result of the sample selection. This study introduces a Maximum Entropy (MaxEnt) process regression model that assumes a MaxEnt prior distribution for its nonparametric regression function and finds that the MaxEnt process regression model includes the well-known Gaussian process regression (GPR) model as a special case. Then, this special MaxEnt process regression model, i.e., the GPR model, is generalized to obtain a robust sample-selection Gaussian process regression (RSGPR) model that deals with non-normal data in the sample selection. Various properties of the RSGPR model are established, including the stochastic representation, distributional hierarchy, and magnitude of the sample-selection bias. These properties are used in the paper to develop a hierarchical Bayesian methodology to estimate the model. This involves a simple and computationally feasible Markov chain Monte Carlo algorithm that avoids analytical or numerical derivatives of the log-likelihood function of the model. The performance of the RSGPR model in terms of the sample-selection bias correction, robustness to non-normality, and prediction, is demonstrated through results in simulations that attest to its good finite-sample performance.

Entities: Chemical Species

Keywords: Gaussian process model; Markov chain Monte Carlo; bias correction; hierarchical Bayesian methodology; robust sample-selection MaxEnt process regression model; sample-selection bias

Year: 2018 PMID： 33265353 PMCID： PMC7512777 DOI： 10.3390/e20040262

Source DB: PubMed Journal: Entropy (Basel) ISSN： 1099-4300 Impact factor: 2.524

1. Introduction

The Bayesian nonparametric method is a powerful approach for regression problems when the shape of the underlying regression function is unknown, the function may be difficult to evaluate analytically, or other requirements such as design costs may complicate the process of information acquisition process. Bayesian orthogonal basis expansion regression, spline smoothing regression, wavelet regression, and Gaussian process regression (GPR) are powerful nonparametric Bayesian approaches that address these regression problems. These regression techniques have been extensively used in fields such as psychology, data science, engineering, neuroscience, and fishery [1,2,3,4,5]. Sample selection (or incidental truncation) in regression analysis is known to often arise in a wide variety of practical problems and standard analysis of data with sample selection leads to biased results because the selected sample represents only a subset of a full population; see [6,7,8]. Regression analysis also has problems regarding sensitivity to outliers and departures from the normality of the dependent variable (see [9,10]). Thus, when one implements nonparametric Bayesian regression with non-normal data with sample selection, the selection mechanism and non-normality of the data must be jointly modeled with the Bayesian nonparametric regression model to correct the sample-selection bias and to implement a robust statistical inference. In this regard, several estimation procedures have been considered in the literature to produce robust linear regression models that are subject to sample selection including for instance [6,7], for frequentist methods, and [8,9,11] for Bayesian methods. See [9,12] to obtain robust Bayesian sample-selection models other than the regression model. In addition, no studies have generalized a nonparametric regression model to deal with non-normal data with the sample selection. The objective of this paper is to introduce the Maximum Entropy (MaxEnt) process regression model as a new Bayesian nonparametric regression model and to then generalize this model to propose a robust sample-selection Bayesian nonparametric regression model along with its inferential methodology. The MaxEnt process regression model is obtained by assuming a MaxEnt prior distribution for its nonparametric regression function, and it includes the GPR model as a special case. This provides a relationship between the MaxEnt nonparametric regression approach and the rationale to conduct a Gaussian regression analysis. This study focuses on the GPR model as a special MaxEnt process regression model and a powerful analysis model towards nonparametric regression problems. Then, the GPR model is generalized to obtain a robust sample-selection Gaussian process regression (RSGPR) model. This RSGPR model extends the GPR model to account for the sample selection scheme, and it is robust when the data are heavy-tailed or contain outliers. The RSGPR model consists of two components. The first is a robust GPR model that determines the level of the dependent variable of interest and the second is an equation that describes the selection mechanism that determines whether we have observed the dependent variable or not. The sample-selection bias arises when these two components are correlated and must be modeled jointly. A Bayesian hierarchical methodology is developed here to estimate the RSGPR model. This methodology relies on a stochastic representation technique (see, e.g., [13]) to set up the Bayesian hierarchy of the RSGPR model, and it has three attractive features. First, given the likelihood function of the model, the posterior of its parameters does not belong to any well-known parametric family, but the methodology uses a simple Markov chain Monte Carlo (MCMC) algorithm that does not resort to generating random draws from the complex posterior. Second, the output of the algorithm not only provides a Bayesian analogue of confidence intervals for the regression function, but it also readily gives an indication of the presence (or absence) of the sample-selection bias. Third, if there is prior information, such as restrictions on the regression function, such information can be incorporated easily through a prior distribution. The remainder of this study is organized as follows. Section 2 introduces the MaxEnt process regression model that strictly includes the GPR model. Then, this section formulates the RSGPR model that is obtained by incorporating the GPR model with a class of scale mixtures of normal errors as well as a selection model comprising a class of scale mixtures of the probit sample selection equations. Properties of this RSGPR model are studied including the exact distribution of a selected observation, a stochastic representation, a distributional hierarchy, and the magnitude of the sample-selection bias. In Section 3, we construct a Bayesian hierarchical model for inference in the RSGPR model by exploiting the stochastic representation and distributional hierarchy. Then we develop a Bayesian estimation methodology based on the hierarchical model to provide a simple estimation procedure for the RSGPR model. We further construct a computationally feasible MCMC algorithm through a Bayesian hierarchical approach. Section 4 examines the finite-sample performance of the method through a limited but informative simulation. This numerical illustration shows the usefulness of the RSGPR model for the Gaussian process regression analysis of non-normal data with the sample selection. The study then concludes with a discussion in Section 5. Proofs and additional details are provided in the Appendix A.

2. Robust Sample-Selection GPR Model

2.1. MaxEnt Process Regression Model

Consider the following nonparametric regression model, where is vector of responses, is vector of regression function values satisfying , is the design matrix, and is a vector of i.i.d. random noises with zero mean vector. In the basic model structure of (1), the parametric form of the regression function is not assumed, but is assumed to have specific types of functional structure. For example, can be represented with a Fourier series [14], splines [15], kernels [16] and others. In the Bayesian nonparametric regression, we assume that the regression function (or signal term) is a random function that follows a particular distribution. This distribution is subjective in the sense that the distribution reflects our uncertain prior information regarding the function. Sometimes, we have a situation in which partial prior information on is available, outside of which it is desirable to use a prior that is as non-informative as possible. In this situation, Boltzmann’s maximum entropy theorem (see, e.g., [17]) yields a maximum entropy prior that is an exponential form and maximizes the entropy, in the presence of partial information for various moment functions of In a special case where we only have partial prior information about the mean vector and covariance matrix functions of of the Bayesian nonparametric regression model (1), Boltzmann’s maximum entropy theorem yields the following prior distribution of . Let for Note that the Gaussian process defines a collection of random functions wherein any finite subset of the process has multivariate normal (Gaussian) distribution. From now on, we will write the Gaussian process as The only restriction on the Gaussian process is that the covariance function must be an positive definite symmetric (pds) matrix. If is not a pds matrix, then the corresponding value of will not be defined (see, e.g., [18]). As a result, this paper introduces yet another Bayesian nonparametric regression model by combining the regression model (1) and the MaxEnt prior in Lemma 1 and introducing a normal regression error distribution. The model is named as a “MaxEnt process regression model” and defined by where is an n-variate normal distribution with mean vector and covariance matrix denotes an inverse gamma distribution with shape parameter and scale parameter is independent of and the mean function reflects the expected function value at input , i.e., and the covariance function models the dependence between the function values at different input points and i.e., See [19] for choice of an appropriate covariance function based on assumptions such as smoothness and likely patterns to be expected in the data. A commonly used isotropic covariance function in practice is the squared exponential covariance function given by where and are hyperparameters and which are relevant for the shape of MaxEnt process regression. Here stands for global scale of the covariance matrix and stands for smoothing parameter, respectively. We can easily see that the MaxEnt process regression model (5) is the same as the GPR model considered by [19,20]. This proves the following corollary and can hence be used as an information theoretic justification for using the GPR model as a Bayesian approach for a nonparametric regression analysis. Suppose and are all the prior information on for the Bayesian nonparametric regression model (1). Then MaxEnt prior distribution of is which defines the GPR model. According to Corollary 1, we shall denote the MaxEnt process regression model by the GPR model. When there is no functional constraint in the GPR model, then the prior specification in the model (3) can be used, and posterior inference can be performed without difficulty. It is seen, from [20], that the conditional posterior distribution of is normal with the mean and covariance given by However, in the GPR analysis, a sample-selection scheme often applies to the response variable that results in missing not at random (MNAR) observations on the variable. In this case, the regression analysis using only the selected cases will lead to biased results (see, e.g., [6,7,8]). This study provides a bias correction procedure for the GPR analysis with MNAR data generated from the sample-selection scheme. For the analysis, we propose a robust sample-selection GPR (RSGPR) model based on a family of scale mixtures of normal (SMN) distributions (see [21,22] for details). This approach reflects the MNAR mechanism as well as its robustness against departures from the normality assumption (see, e.g., [11,12]), and proposes a robust GPR model to analyze the partially observed sample-selection data.

2.2. Proposed Model

We propose the RSGPR model through the following steps. First, we modify the GPR model (3) by incorporating the SMN error distribution for a robust GPR analysis. Then, we connect the robust GPR model directly to a sample-selection model by introducing some latent variables to explain the partially observed sample-selection data. To model the sample-selection mechanism, we need to introduce some notation for the partially observed data. Let be a binary variable that takes on value 1 if of subject i is observed using the sample-selection scheme, and 0 if that of the subject is not observed using the same scheme. Then, we introduce the following RSGPR model to represent the regression equation of the observable variable where is an indicator function, a scale mixture of bivariate normal distributions with mixture function and mixing variable Here and are parameters to be elicited by using the priors and . Without loss of generality, we assume that the single sample selection scheme is applied to a random sample of n observations (’s) associated with the model (1) and gives only the first observed values of ’s out of the n observations according to the sample-selection scheme. Thus, the overall available data information of the RSGPR model consists of the set of binary values and the -tuples of observations corresponding to individuals with while for those with The purpose of this study is to estimate the regression function based on partially observed data (i.e., sample-selection data) with size For fixed , the density of the RSGPR model (6) is composed of a continuous component and a discrete component The discrete component is where with a selection interval and denotes the probability of the interval under the distribution with the density The continuous component is a density of for This density is given by where and This distribution is essentially a member of the class of skew-scale mixtures of normal (skew-SMN) distributions discussed by [13,23,24]. We will denote the distribution law of with density (8) by skew- where The following lemma is useful to generate the partially observed ’s and indicate the difference between the RSGPR model (6) and the GPR model (3). For a given value of where Lemma 2 shows that the RSGPR model applies to relax the classic assumption of the underlying normality as well as to reflect the sample-selection scheme. This lemma also indicates that the partially observed data ’s does not represent a random sample from the GPR model generating ’s, even after controlling for the regression function If we want to apply a GPR analysis to the partially observed sample-selection data, a fitted model should be the RSGPR model. The RSGPR model changes depending on the choice of the distribution of and its function In the special case wherein the distribution of degenerates at the RSGPR model produces a sample-selection Gaussian process normal error regression (SGPR) model. When we choose a gamma distribution with mean 1 and the model becomes a sample-selection Gaussian process error regression (SGPR) model, allowing to regulate the tail distribution of the model by means of the degrees of freedom. We also see that the RSGPR model strictly includes the GPR model because the latter is obtained by setting For the remainder of this study, we use the symbols in the preceding sections with the same definitions.

2.3. The Sample-Selection Bias

As indicated by the density (8) and Lemma 2 the selected observations ’s do not represent a random sample from the GPR model generating ’s, but they are missing not at random (MNAR) [25] inducing a sample-selection bias. The following results on the sample-selection bias are noted in the Bayesian estimation of the GPR model with the partially observed data. Given the RSGPR model ( where As shown in Lemma 3, if we use the partially observed ’s to estimate the GPR model, the existence of the truncated normal distribution term (i.e., ) in Equation (10) induces a biased estimation of the regression function. Note that the distribution becomes normal (i.e., ) in the GPR model for the case where ’s are fully observed. Therefore, the usual estimation of the regression function based on the GPR model will produce inconsistent results when This clearly reveals that sample-selection bias occurs in Bayes estimation of the regression function The magnitude of this bias is as follows. Instead of the SGPR where The sample-selection bias in calculating the marginal effect (or propensity) of a predictor can be also expected. Suppose that where To compare the SGPR model with the model, various values of the sample-selection bias associated with the posterior mean (see, Corollary 2) and the difference in the marginal effect of the k-th predictor (see, Corollary 3) were calculated and are depicted in Figure 1. For the calculation, we set and an intra-class covariance matrix, where is an summing vector whose elements are all one. The left panel in Figure 1 is a graph of the sample-selection bias for different values of and This graph shows the values of the first element of the bias vector given in Corollary 2. From the graph, we see that the sample-selection bias occurs in the GPR analysis with sample selection, and its magnitude becomes larger as the values of or become larger. The sign of the bias is opposite to that of The right panel shows a graph of the difference in the marginal effect (defined by Equation (11)) as a function of and This graph shows that the absolute value of the difference increases rapidly as tends to have a large value, and this difference tends to be larger as the absolute value of becomes larger. Furthermore, the signs of the difference and are different, which is expected for the case where These panels imply that an inconsistent nonparametric regression analysis is unavoidable, provided that the GPR model is fitted to the partially observed sample-selection data. Instead, the proposed RSGPR model should be used to correct the sample-selection bias and to estimate the true marginal effect of each predictor in the regression analysis.

Figure 1

Graphs of the sample-selection bias and the difference in marginal effect of the k-th predictor.

3. Bayesian Hierarchical Methodology

3.1. Hierarchical Representation of the RSGPR Model

Let us revisit the RSGPR model (6) in Section 2.2. From Equations (7) and (8), we see that the log-likelihood function of the RSGPR model based on the partially observed n-tuples of observations is This is a complex function for the Bayesian estimation of the parameters ( and ) of the RSGPR model. Instead, the following hierarchical representation of the RSGPR model is useful for a simple estimation of the parameters. First, the likelihood function in Equation (12) can be represented by the following distributional hierarchy. For the n-pairs of independent observations, where When the prior information on and is not available, a convenient strategy of avoiding improper posterior distribution is to use proper priors with their hyperparameters fixed as appropriate quantity to reflect the diffuseness of the priors (i.e., limiting non-informative priors). For this convenience, the prior distributions in Theorem are used to elicit the prior distributions of and All hyperparameters that appeared in the prior distributions of the Bayesian hierarchical model are assumed to be given from the prior information of previous studies or other sources.

3.2. Full Conditional Posteriors

Let , and be observed. Further suppose that and are the latent observation vector and the scale mixing vector, respectively. Then, based on the RSGPR model, we obtained joint posterior distribution of given the observed data set where is the p.d.f. of the scale mixing variable Note that the joint posterior in (13) is not simplified in an analytic form of the known density and is thus intractable for posterior inference. Instead, we derive conditional posterior distribution of each parameter in in an explicit form, which will be useful for posterior inference by using a Markov chain Monte Carlo (MCMC) method. Given the joint posterior distribution (13), we can obtain the following posterior distributions whose derivations are provided in Appendix A: The full conditional posterior distribution of is given by where and The full conditional posterior distribution of is an inverse Gamma distribution: The full conditional posterior distribution of is a normal distribution: where The full conditional posterior distributions of ’s are independent and their distributions are given by for where The full conditional posterior density of is: where and The full conditional posterior densities of ’s are independent and they are given by

3.3. Markov Chain Monte Carlo Method

The MCMC scheme, working with the full conditional distributions of the parameters in is not complicated to implement. A routine Gibbs sampler can be used to generate posterior samples of and based on each of their full conditional posterior distributions obtained in Section 3.2. In posterior sampling of ’s, Metropolis–Hastings (M–H) within the Gibbs algorithm can be applied because their conditional posterior densities may not have explicit form of known distribution as in Equation (19). For Gibbs sampling, one should note the following points: Given the initial values of the implementation of the Gibbs sampler involves R iterative sampling from each of the full conditional posterior distributions obtained in Equation (14) through Equation (19). Gibbs samples of and can be obtained by using those of and . If degenerates at the RSGPR model can be reduced to the SGPR model. In this case, the MCMC procedure excludes the Gibbs sampling of ’s by using the posterior distribution (19). For various distributions of mixing variable and mixing functions of the SMN distributions such as , , and models (see, e.g., [21,22]). When and the RSGPR model becomes the SGPR model. For generating ’s, we may use the following posteriors where Except for the SGPR and SGPR we need to adopt the Metropolis–Hastings algorithm within the Gibbs sampler because the conditional posterior density of does not have explicit form of known distribution. See [26,27] for the algorithm for sampling from the posterior density. When the squared exponential covariance function in Equation (4) is chosen with unknown hyperparameters and we need to elicit the priors of and for the full Bayes methods based on the MCMC method. The priors considered by [28] can be used for this assessment as follows. The prior distributions are a conjugate and Here denotes the half-Cauchy distribution with the p.d.f. location parameter and scale parameter See [28], for compatibility with to elicit the prior information on Full conditional posterior distributions of and are where and Note that the conditional posterior density of does not have explicit form of known distribution. This implies the use of the Metropolis–Hastings algorithm within the Gibbs sampler to generate from the posterior density. After obtaining the Gibbs samples of we can use them for Monte Carlo estimation of regression function and missing observations They can be also used for predicting regression functions and s evaluated at new predictors (see, e.g., [26]).

3.4. Prediction with Bias Corrected Regression Function

According to the Gaussian (i.e., MaxEnt) process prior, the joint distribution of the training outputs () and test outputs is where denotes the matrix of the covariances evaluated at all pairs of training points and test points and similarly for the other entities The RSGPR framework provides a straightforward way of predicting test outputs based on the relevant test points and the training outputs. Conditioning the joint Gaussian prior distribution on the training observations, we arrive at the predictive distribution for the future (or missing) regression function given by The regression function () value can be sampled from the predictive distribution (21) by evaluating the mean and covariance matrix of the distribution. Thus, it can be generated within the preceding MCMC algorithm for estimating the RSGPR model: We can generate and unobserved observation vector in the r-th iteration of the algorithm whose Markov chain is augmented by the following conditional distributions. where and Let and are respective samples generated from R iterations, then bias corrected expected value of and that of posterior predictive distribution of can be approximated via Monte Carlo by Note that

4. Numerical Illustrations

This section presents empirical results of the Bayesian hierarchical RSGPR analysis of non-normal data with the sample selection. We provide results obtained from simulated data applications comparing the performance of the RSGPR model with that of the GPR model. We developed our program written in R (see, e.g., [29]), which is available from the authors upon request.

4.1. Simulation Scheme

In this simulation, we evaluated the finite-sample performance of the RSGPR model by using sample-selection data generated for different sizes. The performance was assessed in terms of sample-selection bias correction and robustness to non-normal model errors. These could be measured by comparing the posterior estimation and prediction results of the RSGPR model with those based on the GPR model. Specifically, we compared the results obtained from the SGPR (or SGPR) analysis with the results of the GPR (or GPR) analysis based on a partially observed sample-selection data. This study also demonstrated that the SGPR model is more robust against outliers compared to the SGPR model. To evaluate the performance, we generated sets of partially observed sample-selection data with size with (i.e., the missing rate is 0.5) from each of the three models (see details below). The general form of the three models is as follows: where and Model 1 was defined by assuming that the distribution G degenerates at Model 2 was obtained from the model (22) by setting and Model 3 assumed a mixture of bivariate normal errors instead of the distribution. Throughout our simulation, the hyper-parameters for the Bayesian hierarchical model in Theorem 1 were chosen to reflect the diffuseness of the priors. To obtain the limiting non-informative priors of and their hyper-parameters were assessed as and Note that when our observational data were augmented through proper prior information, as in this simulation study, the issue to identify the parameters in the RSGPR model disappeared. In the simulation, we proceeded as follows to estimate the parameters. Using each of datasets generated from the models (Model 1, Model 2, and Model 3), we fitted the RSGPR and GPR models and applied the proposed Bayesian hierarchical methodology to estimate the parameters of the fitted models by assuming the above prior distributions. To implement the methodology by using each generated dataset, we obtained 15,000 posterior samples from the developed MCMC algorithm (in Section 3) with 5 thinning periods after a burn-in period of 5000 samples. This sampling plan guaranteed a convergence of the chain of the MCMC algorithm. The MCMC method (applied to each of datasets) gave estimates (or predictions) of the nonparametric regression function () as well as the other parameters of the RSGPR model. The variability in the regression function estimates () and predictions () obtained by using a dataset were then visualized as shown in Figure 2 and Figure 3. These figures compare the estimates (or predictions) of the nonparametric regression function obtained from two models (the RSGPR and GPR models). The black line of each graph in the figures shows the true regression function of the model (22). The red dashed line denotes the posterior mean (or predicted value) of the regression function of the model (22) obtained by using a Bayesian hierarchical RSGPR analysis with the sample-selection data of size while the blue dashed line depicts that obtained by using a GPR analysis of the sample-selection data. The 97.5th quantile and 2.5th quantile of 3000 posterior samples (predictions) of each regression function () in the RSGPR model were also calculated. In each figure, these quantiles were used to draw 95% posterior (or prediction) intervals of ’s by using the gray band. The accuracy of parameter estimates was calculated by using the mean absolute bias (MAB) and the root mean square error (RMSE): where and is the posterior estimate of ℓ-th element of parameter vector in the k-th replication.

Figure 2

Graphs of estimated regression functions (left panel) and predicted regression functions (right panel): (i) black lines are used for the true regression function; (ii) red dashed lines for the robust sample-selection Gaussian process regression (RSGPR) models; (iii) blue dashed lines for the Gaussian process regression (GPR) models.

Figure 3

Graphs of regression functions: estimated regression functions (left panel) and predicted regression functions (right panel).

4.2. Performance of the RSGPR Model

4.2.1. Sample-Selection Data Generated from Model 1

If the distribution G is degenerated at we then obtain the SGPR model from the RSGPR model (6). Using each of datasets generated from Model 1, the proposed Bayesian hierarchical methodology was applied to estimate the parameters of the model. If we set the methodology yielded posterior estimates of the GPR model. The estimation results for the parameter of our primary interest, based on the SGPR and GPR models, are shown in the left panel of Figure 2. The left panel provides the following results. First, the posterior estimates of based on the proposed SGPR model (red dashed line) are close to their true values (black line), while those based on the GPR model (blue dashed line) tend to have severe sample-selection bias. Second, when the SGPR model was used to fit the generated sample-selection dataset, the posterior estimates of regression function based on the model concentrated true values of the ’s as shown in their 95% posterior intervals (gray band). Third, the difference between the true regression function (black line) and the estimated regression function obtained by using the GPR model (the blue dashed line) confirms Lemma 4, which shows the existence of the sample-selection bias in the GPR regression for data with the sample selection. In summary, the left panel of Figure 2 illustrates the existence of the sample-selection bias in the GPR analysis with the sample-selection data, as discussed in Section 2.3. It also demonstrates the performance of the proposed methodology based on the SGPR model to eliminate the sample-selection bias (or inconsistency in estimating the regression function), which could not be achieved by using the GPR model. The mean of estimation results for the other parameters were listed in Table 1. As shown in Table 1, the MCMC parameter estimates were close to their true values for the SGPR model, while those based on the GPR model were severely biased. In addition, the MAB and RMSE values in the table ensure that the performance of the SGPR model is far better than the GPR model when the sample selection data was used for Bayesian nonparametric regression analysis. For both models, the small values of Monte Carlo (MC) error (compared to the RMSE) of each parameter suggests that approximate convergence was reached and the sequence generated from the MCMC samples was well mixed.

Table 1

Posterior Summary.

True Value	Mean	s.d.	SGPRN Model		MC Error	Mean	s.d.	GPR Model		MC Error
True Value	Mean	s.d.	RMSE	MAB	MC Error	Mean	s.d.	RMSE	MAB	MC Error
σ=3	2.831	0.308	0.351	0.426	0.018	2.094	0.104	0.912	0.800	0.002
ρ=0.5	0.380	0.376	0.563	0.287	0.064	NA	NA	NA	NA	NA
			SGPR t10 Model					GPR t10 Model
σ=3	2.880	0.974	0.509	0.515	0.050	2.130	0.109	0.876	0.800	0.003
ρ=0.5	0.435	0.275	0.627	0.422	0.032	NA	NA	NA	NA	NA

s.d.: standard deviation; SGPR: sample-selection Gaussian process normal error regression; RMSE: root mean square error; MAB: mean absolute bias; MC: Monte Carlo; GPR: Gaussian process regression.

4.2.2. Data Generated from Model 2

The proposed Bayesian hierarchical methodology for the SGPR model was applied to each of datasets generated from Model 2. The SGPR model can be obtained from the SGPR model (6) by setting and If we set the methodology could also be used to obtain posterior samples to estimate the GPR model (GPR model with errors). The results of the simulation appear in the right panel of Figure 2 and Table 1. Graphs in the right panel of Figure 2 depict the prediction results of based on the SGPR and GPR models. The graphs clearly show that the sample-selection bias in predicting ’s based on the GPR model is too large to allow for a prediction of the true regression function (or future regression function). However, the proposed methodology using the SGPR model correctly predicted the true regression function; see 95% prediction interval and i.e., red dashed line. The prediction of based on the SGPR model is far better than that based on the GPR model. Compared to the GPR model, the methodology based on the SGPR model yields smaller MAB and smaller RMSE of the parameter estimates; see Table 1. Table 1 shows that the parameter estimates of the SGPR model with heavy-tailed errors tend to produce larger estimation errors (MAB and RMSE) than those of the SGPR model. The results of the above simulation demonstrate the superior performance of the SGPR model and the usefulness of the proposed Bayesian hierarchical methodology to remedy the sample-selection bias in the prediction that occurred in the GPR analysis of the sample-selection data.

4.2.3. Data Generated from Model 3 with Normal Mixture Errors

We generated datasets from Model 3 with size Model 3 was defined by the model (22) with independent bivariate normal mixture errors: viz. where 50% of the outcomes were missing in each dataset and was equal to whose value of was The generated dataset was fitted to the SGPR, SGPR, and SGPR models in turn. Based on posterior samples, we calculated the Bayes estimates of the three models’ parameters together with their deviance information criterion (DIC) values introduced by [30]. The average DIC values obtained from the dataset were found to be 2727.06, 1477.43, 1396.24 for the SGPR, SGPR, and SGPR models, respectively. This suggested that the SGPR model is the best fitting model among the three models, while the SGPR model is the worst. The graphs in the left panel of Figure 3 show the true regression function (black line) and estimated regression functions () under the best fitting SGPR model (red line) and the SGPR model (blue line). The graphs in the right panel of Figure 3 depict predicted regression function () based on the best fitted model (in red), the SGPR model (in blue), and the true regression function (in black). Even though the best fitted model based on bivariate error distributions was misspecified, 95% posterior intervals (or prediction intervals) of obtained from the SGPR model did include the true regression function values (see gray bands in Figure 3). The prediction result of the SGPR and SGPR models are very wild due to outliers generated by the normal mixture errors, while the graphs of the SGPR are more robust for the model misspecification.

5. Conclusions

This study considered a MaxEnt approach to develop a Bayesian nonparametric regression analysis of non-normal data with the sample selection. For this purpose, by using Boltzmann’s maximum entropy theorem, we introduced a MaxEnt process regression model that reflects partial prior information for an uncertain regression function. We found that a special case of the MaxEnt regression model reduced to the well-known GPR model. Second, we generalized the GPR model to propose the RSGPR model and explored its theoretical properties. These properties showed that the new model was well-designed to correct the sample-selection bias and implement a robust GPR analysis. Third, we developed a hierarchical RSGPR model based on a stochastic representation of the RSGPR model and proposed a Bayesian hierarchical methodology for the RSGPR analysis of a non-normal data with sample selection. A simulation study showed that the finite sample performance of the proposed methodology eliminated the sample selection bias and estimated the population model parameters with robustness and high accuracy. In a comparative numerical study on the analysis of nonparametric regression models with sample selection data, we found that the estimation results using the RSGPR model outperformed those using the GPR model for both in-sample estimation and out-of-sample forecasts. The theoretical results of the RSGPR model and the methodology for the RSGPR analysis proposed in this study have several interesting issues that are worth considering further. First, the RSGPR framework using the MaxEnt process prior can be generalized to the so called stochastically constrained RSGPR regression that uses the constrained MaxEnt process as the prior distribution of the regression function with uncertain constraints. Second, an empirical study with real data as well as an asymptotic evaluation, such as consistency, would be particularly noteworthy to explore. For example, estimating monotone regression function with or without uncertainty and testing the monotonicity of the regression function can be considered in the context of a constrained RSGPR analysis with sample-selection data. Finally, the Bayesian hierarchical methodology can be broadened in various regression models with the general class of skew- error distributions considered by [11]. For example, this methodology can be applied to a von Bertalanffy growth curve analysis of heavy-tailed fishery data with sample selection (see, e.g., [28]). We hope to address all of these in the future.

1 in total

1. Gaussian Process Regression for Predictive But Interpretable Machine Learning Models: An Example of Predicting Mental Workload across Tasks.

Authors: Matthew S Caywood; Daniel M Roberts; Jeffrey B Colombe; Hal S Greenwald; Monica Z Weiland
Journal: Front Hum Neurosci Date: 2017-01-11 Impact factor: 3.169

1 in total

1. Integrating machine learning to construct aberrant alternative splicing event related classifiers to predict prognosis and immunotherapy response in patients with hepatocellular carcinoma.

Authors: Wangrui Liu; Shuai Zhao; Wenhao Xu; Jianfeng Xiang; Chuanyu Li; Jun Li; Han Ding; Hailiang Zhang; Yichi Zhang; Haineng Huang; Jian Wang; Tao Wang; Bo Zhai; Lei Pan
Journal: Front Pharmacol Date: 2022-10-03 Impact factor: 5.988

1 in total