| Literature DB >> 35768449 |
Fazli Rabbi1, Alamgir Khalil1, Ilyas Khan2, Muqrin A Almuqrin3, Umair Khalil4, Mulugeta Andualem5.
Abstract
Outlying observations have a large influence on the linear model selection process. In this article, we present a novel approach to robust model selection in linear regression to accommodate the situations where outliers are present in the data. The model selection criterion is based on two components, the robust conditional expected prediction loss, and a robust goodness-of-fit with a penalty term. We estimate the conditional expected prediction loss by using the out-of-bag stratified bootstrap approach. In the presence of outliers, the stratified bootstrap ensures that we obtain bootstrap samples that are similar to the original sample data. Furthermore, to control the undue effect of outliers, we use the robust MM-estimator and a bounded loss function in the proposed criterion. Specifically, we observe that instead of minimizing the penalized loss function or the conditional expected prediction loss separately, it is better to minimize them simultaneously. The simulation and real-data based studies confirm the consistent and satisfactory behavior of our bootstrap model selection procedure in the presence of response outliers and covariate outliers.Entities:
Mesh:
Year: 2022 PMID: 35768449 PMCID: PMC9243146 DOI: 10.1038/s41598-022-14398-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Estimated selection probabilities of , , and based on the least squares estimator and .
| True | Model | m = 15 | m = 20 | m = 25 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1,4,5 | 0.010 | 0.006 | 0.013 | 0.009 | 0.024 | 0.014 | 0.030 | 0.018 | 0.042 | 0.023 | 0.038 | 0.027 | 0.046 | |
| 1,3,4 | 0.019 | 0.010 | 0.021 | 0.011 | 0.050 | 0.014 | 0.044 | 0.023 | 0.100 | 0.038 | 0.075 | 0.039 | 0.046 | |
| 1,2,4 | 0.028 | 0.012 | 0.036 | 0.022 | 0.046 | 0.029 | 0.046 | 0.032 | 0.069 | 0.034 | 0.060 | 0.036 | 0.057 | |
| 1,3,4,5 | 0.000 | 0.000 | 0.001 | 0.000 | 0.001 | 0.000 | 0.001 | 0.001 | 0.005 | 0.001 | 0.004 | 0.001 | 0.004 | |
| 1,2,4,5 | 0.000 | 0.000 | 0.000 | 0.000 | 0.001 | 0.000 | 0.001 | 0.001 | 0.004 | 0.000 | 0.005 | 0.000 | 0.009 | |
| 1,2,3,4 | 0.000 | 0.000 | 0.001 | 0.000 | 0.003 | 0.000 | 0.002 | 0.000 | 0.008 | 0.001 | 0.004 | 0.001 | 0.003 | |
| 1,2,3,4,5 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.002 | 0.000 | 0.000 | 0.000 | 0.000 | |
| 1,3,4,5 | 0.013 | 0.007 | 0.021 | 0.010 | 0.043 | 0.019 | 0.039 | 0.025 | 0.077 | 0.041 | 0.063 | 0.040 | 0.054 | |
| 1,2,4,5 | 0.022 | 0.015 | 0.024 | 0.018 | 0.048 | 0.031 | 0.045 | 0.035 | 0.071 | 0.045 | 0.060 | 0.043 | 0.063 | |
| 1,2,3,4,5 | 0.000 | 0.000 | 0.000 | 0.000 | 0.002 | 0.002 | 0.002 | 0.001 | 0.014 | 0.004 | 0.011 | 0.003 | 0.006 | |
| 1,4,5 | 0.013 | 0.022 | 0.005 | 0.007 | 0.002 | 0.012 | 0.001 | 0.003 | 0.000 | 0.000 | 0.000 | 0.003 | 0.000 | |
| 1,2,5 | 0.001 | 0.002 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
| 1,3,4,5 | 0.001 | 0.003 | 0.001 | 0.001 | 0.004 | 0.005 | 0.001 | 0.003 | 0.004 | 0.005 | 0.001 | 0.003 | 0.001 | |
| 1,2,3,4,5 | 0.009 | 0.007 | 0.015 | 0.008 | 0.038 | 0.017 | 0.040 | 0.023 | 0.080 | 0.044 | 0.069 | 0.045 | 0.065 | |
| 1,3,4,5 | 0.071 | 0.097 | 0.027 | 0.049 | 0.015 | 0.032 | 0.006 | 0.012 | 0.008 | 0.018 | 0.002 | 0.008 | 0.001 | |
| 1,2,4,5 | 0.010 | 0.020 | 0.006 | 0.009 | 0.000 | 0.003 | 0.000 | 0.000 | 0.001 | 0.003 | 0.000 | 0.001 | 0.000 | |
| 1,2,3,5 | 0.011 | 0.014 | 0.000 | 0.003 | 0.000 | 0.001 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
The (*) indicates the optimal model.
Significant values are in bold.
Estimated selection probabilities of , , and based on the least squares estimator and .
| True | Model | m = 15 | m = 20 | m = 25 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1,4,5 | 0.028 | 0.010 | 0.028 | 0.021 | 0.053 | 0.021 | 0.045 | 0.028 | 0.077 | 0.046 | 0.056 | 0.032 | 0.043 | |
| 1,3,4 | 0.029 | 0.010 | 0.029 | 0.018 | 0.071 | 0.018 | 0.057 | 0.027 | 0.116 | 0.054 | 0.076 | 0.037 | 0.047 | |
| 1,2,4 | 0.042 | 0.009 | 0.042 | 0.029 | 0.060 | 0.031 | 0.054 | 0.039 | 0.094 | 0.046 | 0.065 | 0.039 | 0.055 | |
| 1,3,4,5 | 0.001 | 0.000 | 0.001 | 0.001 | 0.006 | 0.001 | 0.004 | 0.001 | 0.013 | 0.003 | 0.007 | 0.002 | 0.004 | |
| 1,2,4,5 | 0.002 | 0.000 | 0.002 | 0.002 | 0.005 | 0.002 | 0.006 | 0.002 | 0.011 | 0.002 | 0.009 | 0.004 | 0.009 | |
| 1,2,3,4 | 0.001 | 0.000 | 0.001 | 0.000 | 0.005 | 0.000 | 0.002 | 0.001 | 0.012 | 0.003 | 0.006 | 0.001 | 0.003 | |
| 1,2,3,4,5 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.005 | 0.000 | 0.000 | 0.000 | 0.000 | |
| 1,3,4,5 | 0.023 | 0.010 | 0.032 | 0.017 | 0.065 | 0.021 | 0.057 | 0.029 | 0.098 | 0.052 | 0.075 | 0.040 | 0.055 | |
| 1,2,4,5 | 0.043 | 0.009 | 0.050 | 0.026 | 0.068 | 0.028 | 0.061 | 0.038 | 0.100 | 0.049 | 0.077 | 0.050 | 0.061 | |
| 1,2,3,4,5 | 0.000 | 0.000 | 0.000 | 0.000 | 0.007 | 0.000 | 0.007 | 0.000 | 0.021 | 0.006 | 0.013 | 0.001 | 0.006 | |
| 1,4,5 | 0.000 | 0.005 | 0.000 | 0.000 | 0.000 | 0.001 | 0.000 | 0.000 | 0.000 | 0.001 | 0.000 | 0.001 | 0.002 | |
| 1,3,4,5 | 0.000 | 0.000 | 0.000 | 0.001 | 0.000 | 0.000 | 0.001 | 0.001 | 0.000 | 0.001 | 0.001 | 0.001 | 0.001 | |
| 1,2,3,4,5 | 0.021 | 0.006 | 0.027 | 0.017 | 0.068 | 0.027 | 0.066 | 0.031 | 0.125 | 0.059 | 0.089 | 0.047 | 0.064 | |
| 1,3,4,5 | 0.008 | 0.036 | 0.001 | 0.007 | 0.002 | 0.010 | 0.001 | 0.005 | 0.000 | 0.005 | 0.001 | 0.001 | 0.001 | |
| 1,2,4,5 | 0.001 | 0.004 | 0.000 | 0.001 | 0.000 | 0.001 | 0.000 | 0.000 | 0.000 | 0.001 | 0.000 | 0.000 | 0.000 | |
| 1,2,3,5 | 0.000 | 0.001 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
The results are based on 1000 Monte Carlo simulations and K = 100 bootstrap replications.
Significant values are in bold.
Selection probabilities of , , and based on LS-estimator and .
| True | Model | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| (1,0,0,1,0) | 1 | 0.000 | 0.002 | 0.002 | 0.002 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| 1,4,5 | 0.158 | 0.088 | 0.104 | 0.071 | 0.065 | 0.034 | 0.039 | 0.030 | 0.038 | 0.016 | 0.031 | 0.017 | 0.012 | 0.007 | 0.012 | 0.010 | |
| 1,3,4 | 0.132 | 0.073 | 0.080 | 0.063 | 0.084 | 0.036 | 0.052 | 0.032 | 0.036 | 0.016 | 0.026 | 0.018 | 0.017 | 0.008 | 0.015 | 0.010 | |
| 1,2,4 | 0.172 | 0.087 | 0.102 | 0.069 | 0.071 | 0.040 | 0.046 | 0.036 | 0.029 | 0.014 | 0.018 | 0.015 | 0.016 | 0.007 | 0.019 | 0.011 | |
| 1,3,4,5 | 0.041 | 0.006 | 0.016 | 0.004 | 0.003 | 0.001 | 0.002 | 0.001 | 0.001 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
| 1,2,4,5 | 0.048 | 0.008 | 0.014 | 0.004 | 0.015 | 0.002 | 0.009 | 0.003 | 0.002 | 0.000 | 0.001 | 0.001 | 0.001 | 0.000 | 0.000 | 0.000 | |
| 1,2,3,4 | 0.025 | 0.004 | 0.010 | 0.002 | 0.010 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.000 | 0.000 | 0.000 | |
| 1,2,3,4,5 | 0.019 | 0.001 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
| (1,0,0,1,1) | 1,5 | 0.000 | 0.001 | 0.000 | 0.001 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| 1,4 | 0.000 | 0.001 | 0.000 | 0.001 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
| 1,3,4,5 | 0.175 | 0.080 | 0.092 | 0.064 | 0.079 | 0.025 | 0.049 | 0.026 | 0.035 | 0.014 | 0.027 | 0.016 | 0.011 | 0.007 | 0.016 | 0.007 | |
| 1,2,4,5 | 0.206 | 0.093 | 0.114 | 0.066 | 0.085 | 0.035 | 0.053 | 0.036 | 0.033 | 0.014 | 0.022 | 0.014 | 0.013 | 0.003 | 0.016 | 0.009 | |
| 1,2,3,4,5 | 0.053 | 0.004 | 0.010 | 0.003 | 0.008 | 0.001 | 0.002 | 0.000 | 0.002 | 0.001 | 0.001 | 0.001 | 0.001 | 0.000 | 0.000 | 0.000 | |
| (1,1,0,1,1) | |||||||||||||||||
| 1,2,3,4,5 | 0.221 | 0.079 | 0.106 | 0.062 | 0.078 | 0.029 | 0.047 | 0.027 | 0.035 | 0.017 | 0.025 | 0.018 | 0.009 | 0.006 | 0.013 | 0.009 | |
The results are based on L = 1000 MC simulations and K = 50 bootstrap replications.
Significant values are in bold.
Estimated selection probabilities of , , and based on MM-estimator and LS-estimator.
| Errors | Model | MM-estimator | LS-estimator | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Simple bootstrap | Stratified bootstrap | Simple bootstrap | |||||||||||
| 1 | 0.368 | 0.392 | 0.267 | 0.257 | 0.000 | 0.000 | 0.001 | 0.001 | 0.756 | 1.000 | 1.000 | 1.000 | |
| 1,3 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.244 | 0.000 | 0.000 | 0.000 | |
| 1,2,3 | 0.342 | 0.246 | 0.011 | 0.005 | 0.084 | 0.032 | 0.002 | 0.002 | 0.000 | 0.000 | 0.000 | 0.000 | |
| 1 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.999 | 1.000 | 1.000 | 1.000 | |
| 1,3 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
| 1,2,3 | 0.065 | 0.032 | 0.017 | 0.014 | 0.0114 | 0.042 | 0.022 | 0.016 | 0.000 | 0.000 | 0.000 | 0.000 | |
| 1 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 | 1.000 | 1.000 | 1.000 | |
| 1,3 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
| 1,2,3 | 0.106 | 0.048 | 0.053 | 0.037 | 0.131 | 0.054 | 0.057 | 0.040 | 0.000 | 0.000 | 0.000 | 0.000 | |
| 1 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
| 1,3 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
| 1,2,3 | 0.109 | 0.039 | 0.067 | 0.044 | 0.141 | 0.061 | 0.081 | 0.052 | 0.131 | 0.051 | 0.071 | 0.042 | |
| 1 | 0.008 | 0.016 | 0.012 | 0.018 | 0.005 | 0.013 | 0.010 | 0.013 | 0.770 | 0.823 | 0.841 | 0.866 | |
| 1,3 | 0.000 | 0.000 | 0.000 | 0.000 | 0.001 | 0.000 | 0.000 | 0.000 | 0.001 | 0.001 | 0.000 | 0.000 | |
| 1,2,3 | 0.038 | 0.010 | 0.027 | 0.009 | 0.065 | 0.021 | 0.038 | 0.018 | 0.002 | 0.001 | 0.000 | 0.000 | |
| 1 | 0.062 | 0.139 | 0.081 | 0.124 | 0.042 | 0.093 | 0.067 | 0.099 | 0.828 | 0.882 | 0.909 | 0.921 | |
| 1,3 | 0.004 | 0.005 | 0.004 | 0.003 | 0.008 | 0.005 | 0.004 | 0.004 | 0.005 | 0.002 | 0.001 | 0.001 | |
| 1,2,3 | 0.049 | 0.014 | 0.033 | 0.018 | 0.071 | 0.024 | 0.037 | 0.022 | 0.003 | 0.000 | 0.000 | 0.000 | |
The outputs are based on 1000 MCsimulations and K = 100 bootstrap replications. The is used for all selection criteria.
Significant values are in bold.
Estimated selection probabilities of , , and based on MM estimator.
| Errors | True | Model | 10% X-outliers | 20% X-outliers | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 1,5 | 0.002 | 0.004 | 0.002 | 0.002 | 0.008 | 0.034 | 0.007 | 0.018 | ||
| 1,3,5 | 0.005 | 0.006 | 0.008 | 0.008 | 0.034 | 0.035 | 0.036 | 0.033 | ||
| 1,2,5 | 0.002 | 0.004 | 0.003 | 0.003 | 0.011 | 0.016 | 0.013 | 0.015 | ||
| 1,3,4,5 | 0.036 | 0.008 | 0.034 | 0.019 | 0.051 | 0.018 | 0.052 | 0.027 | ||
| 1,2,4,5 | 0.041 | 0.012 | 0.045 | 0.026 | 0.062 | 0.014 | 0.048 | 0.026 | ||
| 1,2,3,4,5 | 0.002 | 0.000 | 0.006 | 0.002 | 0.011 | 0.000 | 0.020 | 0.005 | ||
| 1,5 | 0.008 | 0.032 | 0.017 | 0.027 | 0.018 | 0.093 | 0.029 | 0.056 | ||
| 1,3,5 | 0.021 | 0.030 | 0.033 | 0.033 | 0.034 | 0.039 | 0.038 | 0.046 | ||
| 1,2,5 | 0.013 | 0.028 | 0.016 | 0.021 | 0.019 | 0.017 | 0.021 | 0.022 | ||
| 1,3,4,5 | 0.091 | 0.027 | 0.072 | 0.041 | 0.110 | 0.039 | 0.085 | 0.046 | ||
| 1,2,4,5 | 0.255 | 0.089 | 0.170 | 0.94 | 0.282 | 0.098 | 0.179 | 0.109 | ||
| 1,2,3,4,5 | 0.006 | 0.001 | 0.008 | 0.003 | 0.004 | 0.000 | 0.001 | 0.001 | ||
| 1,5 | 0.004 | 0.021 | 0.003 | 0.007 | 0.026 | 0.077 | 0.027 | 0.055 | ||
| 1,4 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.001 | 0.001 | 0.001 | ||
| 1,3,5 | 0.015 | 0.022 | 0.019 | 0.021 | 0.052 | 0.053 | 0.057 | 0.052 | ||
| 1,2,5 | 0.010 | 0.012 | 0.013 | 0.017 | 0.032 | 0.028 | 0.031 | 0.030 | ||
| 1,3,4,5 | 0.045 | 0.012 | 0.042 | 0.018 | 0.042 | 0.013 | 0.030 | 0.017 | ||
| 1,2,4,5 | 0.036 | 0.0.03 | 0.032 | 0.014 | 0.039 | 0.012 | 0.034 | 0.018 | ||
| 1,2,3,4,5 | 0.003 | 0.001 | 0.005 | 0.003 | 0.002 | 0.001 | 0.003 | 0.003 | ||
| 1,5 | 0.001 | 0.004 | 0.001 | 0.002 | 0.005 | 0.019 | 0.008 | 0.015 | ||
| 1,3,5 | 0.005 | 0.007 | 0.003 | 0.005 | 0.029 | 0.035 | 0.032 | 0.032 | ||
| 1,2,5 | 0.003 | 0.005 | 0.005 | 0.004 | 0.012 | 0.012 | 0.009 | 0.010 | ||
| 1,3,4,5 | 0.039 | 0.011 | 0.034 | 0.021 | 0.046 | 0.017 | 0.040 | 0.022 | ||
| 1,2,4,5 | 0.029 | 0.008 | 0.039 | 0.019 | 0.038 | 0.011 | 0.039 | 0.020 | ||
| 1,2,3,4,5 | 0.002 | 0.000 | 0.005 | 0.000 | 0.007 | 0.001 | 0.013 | 0.003 | ||
| 1,5 | 0.045 | 0114 | 0.050 | 0.081 | 0.162 | 0.274 | 0.162 | 0.251 | ||
| 1.4 | 0.001 | 0.002 | 0.001 | 0.001 | 0.007 | 0.018 | 0.009 | 0.011 | ||
| 1,3,5 | 0.046 | 0.034 | 0.044 | 0.035 | 0.055 | 0.041 | 0.056 | 0.046 | ||
| 1,2,5 | 0.020 | 0.022 | 0.019 | 0.022 | 0.045 | 0.042 | 0.037 | 0.033 | ||
| 1,3,4,5 | 0.026 | 0.004 | 0.017 | 0.009 | 0.024 | 0.009 | 0.016 | 0.008 | ||
| 1,2,4,5 | 0.018 | 0.003 | 0.020 | 0.008 | 0.0028 | 0.005 | 0.019 | 0.013 | ||
| 1,2,3,5 | 0.002 | 0.000 | 0.003 | 0.000 | 0.007 | 0.000 | 0.007 | 0.000 | ||
| 1,2,3,4,5 | 0.003 | 0.001 | 0.002 | 0.001 | 0.002 | 0.000 | 0.002 | 0.001 | ||
The results are based on 1000 Monte Carlo simulations and K = 50 bootstrap replications.
Significant values are in bold.
Selected best model for the stack loss data using a range of model selection procedures.
| Selected variables | AIC | BIC | ||||
|---|---|---|---|---|---|---|
| X1 | 2.97 | 3.50 | 4.42 | 4.82 | 4.61 | 4.71 |
| X2 | 3.02 | 3.59 | 4.58 | 5.00 | 2.19 | 2.28 |
| X3 | 3.48 | 4.20 | 5.32 | 5.86 | 3.21 | 3.31 |
| X1, X2 | 1.55 | 1.70 | ||||
| X1, X3 | 2.38 | 3.42 | 4.67 | 5.45 | 2.69 | 2.84 |
| X2, X3 | 2.09 | 3.49 | 3.72 | 4.77 | 1.47 | 1.62 |
| X1, X2, X3 | 1.81 | 3.05 | 3.90 | 5.28 |
Significant values are in bold.