| Literature DB >> 36136044 |
María Del Mar Rueda1, Sara Pasadas-Del-Amo2, Beatriz Cobo Rodríguez3, Luis Castro-Martín1, Ramón Ferri-García1.
Abstract
Web surveys have replaced Face-to-Face and computer assisted telephone interviewing (CATI) as the main mode of data collection in most countries. This trend was reinforced as a consequence of COVID-19 pandemic-related restrictions. However, this mode still faces significant limitations in obtaining probability-based samples of the general population. For this reason, most web surveys rely on nonprobability survey designs. Whereas probability-based designs continue to be the gold standard in survey sampling, nonprobability web surveys may still prove useful in some situations. For instance, when small subpopulations are the group under study and probability sampling is unlikely to meet sample size requirements, complementing a small probability sample with a larger nonprobability one may improve the efficiency of the estimates. Nonprobability samples may also be designed as a mean for compensating for known biases in probability-based web survey samples by purposely targeting respondent profiles that tend to be underrepresented in these surveys. This is the case in the Survey on the impact of the COVID-19 pandemic in Spain (ESPACOV) that motivates this paper. In this paper, we propose a methodology for combining probability and nonprobability web-based survey samples with the help of machine-learning techniques. We then assess the efficiency of the resulting estimates by comparing them with other strategies that have been used before. Our simulation study and the application of the proposed estimation method to the second wave of the ESPACOV Survey allow us to conclude that this is the best option for reducing the biases observed in our data.Entities:
Keywords: COVID-19; machine-learning techniques; nonprobability surveys; propensity score adjustment; survey sampling
Year: 2022 PMID: 36136044 PMCID: PMC9538074 DOI: 10.1002/bimj.202200035
Source DB: PubMed Journal: Biom J ISSN: 0323-3847 Impact factor: 1.715
Population data sources
| Probability | Nonprobability | Blended sample (Unweighted) | Population | ||
|---|---|---|---|---|---|
| Gender | Male | 48.4% | 40.7% | 45.3% | 48.5% |
| Female | 51.6% | 59.3% | 54.7% | 51.5% | |
| Age | 18–29 | 18.2% | 3.3% | 12.1% | 15.0% |
| 30–44 | 33.0% | 15.8% | 26.0% | 25.4% | |
| 45–64 | 41.3% | 37.4% | 39.7% | 35.9% | |
| 65 or more | 7.5% | 43.5% | 22.2% | 23.7% | |
| Age (mean) | 44.2 | 58.3 | 50 | 51 | |
| Education level | First degree | 21.0% | 20.7% | 20.9% | 17.1% |
| Second degree | 18.7% | 26.1% | 21.7% | 49.1% | |
| Higher ED | 60.3% | 53.2% | 57.4% | 33.8% | |
| Labor status | Employed | 69.2% | 41.3% | 57.8% | 48.5% |
| Unemployed | 9.1% | 6.4% | 8.0% | 9.2% | |
| Inactive | 21.7% | 52.3% | 34.2% | 42.3% |
Continuous population register, official population data as of January 1, 2021
Economically active population survey (EAPS), first quarter 2021.
National Statistics Institute of Spain (INE).
Values of |RB| and RMSRE, for each estimator and combination of sample sizes, in the estimation of target variable y 1
|
|
|
| ||||
|---|---|---|---|---|---|---|
| |RB| | RMSRE | |RB| | RMSRE | |RB| | RMSRE | |
|
| 4.031 | 4.282 | 3.935 | 4.179 | 3.990 | 4.188 |
|
| 1.954 | 2.415 | 2.085 | 2.584 | 2.004 | 2.488 |
|
| 2.025 | 2.537 | 2.909 | 3.574 | 3.661 | 4.506 |
|
| 3.641 | 3.926 | 3.574 | 3.856 | 3.644 | 3.856 |
|
| 5.489 | 5.663 | 8.141 | 8.211 | 9.673 | 9.698 |
|
| 5.489 | 5.663 | 8.141 | 8.211 | 9.673 | 9.698 |
|
| 2.562 | 2.990 | 2.641 | 3.125 | 3.359 | 3.701 |
|
| 5.865 | 6.028 | 7.462 | 7.546 | 8.621 | 8.652 |
|
| 3.057 | 3.425 | 2.544 | 2.966 | 2.005 | 2.391 |
|
| 4.379 | 4.603 | 4.269 | 4.488 | 4.329 | 4.498 |
|
| 3.496 | 3.851 | 3.081 | 3.493 | 2.320 | 2.722 |
|
| 5.648 | 5.816 | 6.744 | 6.837 | 7.636 | 7.678 |
|
| 3.270 | 3.621 | 2.702 | 3.120 | 2.084 | 2.465 |
|
| 4.425 | 4.646 | 4.302 | 4.516 | 4.359 | 4.521 |
|
| 4.495 | 4.792 | 4.327 | 4.707 | 3.470 | 3.937 |
|
| 6.131 | 6.283 | 7.109 | 7.195 | 7.847 | 7.887 |
|
| 1.869 | 2.305 | 1.920 | 2.360 | 1.757 | 2.185 |
|
| 1.912 | 2.360 | 1.980 | 2.445 | 1.897 | 2.369 |
Values of |RB| and RMSRE, for each estimator and combination of sample sizes, in the estimation of target variable y 2
|
|
|
| ||||
|---|---|---|---|---|---|---|
| |RB| | RMSRE | |RB| | RMSRE | |RB| | RMSRE | |
|
| 4.453 | 4.749 | 4.449 | 4.701 | 4.501 | 4.696 |
|
| 2.423 | 3.045 | 2.274 | 2.904 | 2.268 | 2.850 |
|
| 2.032 | 2.512 | 2.702 | 3.387 | 3.607 | 4.466 |
|
| 3.112 | 3.546 | 3.053 | 3.491 | 3.183 | 3.600 |
|
| 7.106 | 7.296 | 9.933 | 10.011 | 11.459 | 11.492 |
|
| 7.106 | 7.296 | 9.933 | 10.011 | 11.459 | 11.492 |
|
| 2.496 | 2.985 | 2.547 | 3.065 | 3.143 | 3.653 |
|
| 6.090 | 6.393 | 8.011 | 8.160 | 9.414 | 9.493 |
|
| 3.376 | 3.765 | 2.740 | 3.181 | 2.298 | 2.796 |
|
| 4.153 | 4.463 | 4.105 | 4.405 | 4.190 | 4.424 |
|
| 3.745 | 4.096 | 3.243 | 3.653 | 2.522 | 3.004 |
|
| 5.371 | 5.582 | 6.400 | 6.523 | 7.262 | 7.336 |
|
| 3.505 | 3.854 | 2.871 | 3.275 | 2.311 | 2.762 |
|
| 4.207 | 4.478 | 4.119 | 4.360 | 4.201 | 4.401 |
|
| 4.825 | 5.109 | 4.610 | 4.970 | 3.779 | 4.321 |
|
| 5.832 | 6.016 | 6.741 | 6.852 | 7.459 | 7.528 |
|
| 1.932 | 2.394 | 1.821 | 2.287 | 1.740 | 2.166 |
|
| 1.961 | 2.434 | 1.879 | 2.369 | 1.855 | 2.311 |
Values of |RB| and RMSRE, for each estimator and combination of sample sizes, in the estimation of target variable y 3
|
|
|
| ||||
|---|---|---|---|---|---|---|
| |RB| | RMSRE | |RB| | RMSRE | |RB| | RMSRE | |
|
| 9.652 | 10.500 | 9.677 | 10.392 | 9.510 | 10.125 |
|
| 5.657 | 6.991 | 5.506 | 6.897 | 5.680 | 7.101 |
|
| 5.446 | 6.785 | 7.462 | 9.034 | 9.951 | 11.792 |
|
| 8.748 | 9.676 | 8.752 | 9.515 | 8.690 | 9.379 |
|
| 13.114 | 13.715 | 19.896 | 20.108 | 23.223 | 23.307 |
|
| 13.114 | 13.715 | 19.896 | 20.108 | 23.223 | 23.307 |
|
| 6.198 | 7.467 | 6.274 | 7.530 | 8.006 | 8.910 |
|
| 14.009 | 14.588 | 18.213 | 18.473 | 20.660 | 20.767 |
|
| 7.377 | 8.550 | 6.437 | 7.581 | 5.031 | 6.133 |
|
| 10.489 | 11.302 | 10.436 | 11.102 | 10.333 | 10.901 |
|
| 8.403 | 9.512 | 7.505 | 8.658 | 5.720 | 6.839 |
|
| 13.498 | 14.127 | 16.461 | 16.795 | 18.259 | 18.403 |
|
| 7.854 | 8.995 | 6.822 | 7.952 | 5.221 | 6.312 |
|
| 10.590 | 11.385 | 10.494 | 11.151 | 10.394 | 10.944 |
|
| 10.641 | 11.695 | 10.851 | 11.985 | 8.403 | 9.855 |
|
| 14.644 | 15.224 | 17.307 | 17.623 | 18.765 | 18.904 |
|
| 5.255 | 6.453 | 5.300 | 6.512 | 5.155 | 6.435 |
|
| 5.272 | 6.474 | 5.849 | 7.303 | 5.214 | 6.509 |
Mean jackknife estimate of the variance and confidence intervals' mean coverage and length from the simulation runs in the estimation of target variable y 1
|
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|---|
| J. variance | Coverage | Length | J. variance | Coverage | Length | J. variance | Coverage | Length | |
|
| 1.801 | 0.610 | 3.867 | 3.083 | 0.628 | 4.956 | 4.718 | 0.558 | 5.819 |
|
| 0.099 | 0.948 | 1.230 | 0.099 | 0.936 | 1.231 | 0.100 | 0.952 | 1.238 |
|
| 0.441 | 0.938 | 1.888 x | 2.890 | 0.940 | 4.480 | 7.217 | 0.968 | 8.536 |
|
| 1.102 | 0.820 | 3.254 | 1.720 | 0.884 | 4.148 | 3.475 | 0.972 | 6.075 |
|
| 0.086 | 0.942 | 1.149 | 0.084 | 0.932 | 1.135 | 0.088 | 0.948 | 1.152 |
|
| 0.089 | 0.940 | 1.169 | 0.090 | 0.940 | 1.173 | 0.104 | 0.952 | 1.259 |
Mean jackknife estimate of the variance and confidence intervals' mean coverage and length from the simulation runs in the estimation of target variable y 2
|
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|---|
| J. variance | Coverage | Length | J. variance | Coverage | Length | J. variance | Coverage | Length | |
|
| 19.259 | 0.582 | 12.054 | 36.202 | 0.568 | 16.214 | 54.922 | 0.542 | 19.548 |
|
| 0.114 | 0.896 | 1.321 | 0.113 | 0.908 | 1.314 | 0.114 | 0.914 | 1.320 |
|
| 3.104 | 0.932 | 3.765 | 9.887 | 0.960 | 8.360 | 26.854 | 0.976 | 15.752 |
|
| 14.488 | 0.830 | 10.963 | 23.012 | 0.888 | 14.303 | 43.755 | 0.974 | 20.984 |
|
| 0.102 | 0.932 | 1.248 | 0.099 | 0.934 | 1.229 | 0.108 | 0.954 | 1.273 |
|
| 0.105 | 0.932 | 1.265 | 0.104 | 0.942 | 1.262 | 0.121 | 0.972 | 1.359 |
Mean jackknife estimate of the variance and confidence intervals' mean coverage and length from the simulation runs in the estimation of target variable y 3
|
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|---|
| J. variance | Coverage | Length | J. variance | Coverage | Length | J. variance | Coverage | Length | |
|
| 0.017 | 0.708 | 0.377 | 0.028 | 0.648 | 0.466 | 0.045 | 0.636 | 0.570 |
|
| 0.001 | 0.952 | 0.140 | 0.001 | 0.960 | 0.141 | 0.001 | 0.960 | 0.141 |
|
| 0.005 | 0.944 | 0.196 | 0.021 | 0.946 | 0.431 | 0.071 | 0.974 | 0.856 |
|
| 0.011 | 0.882 | 0.323 | 0.023 | 0.944 | 0.455 | 0.034 | 0.976 | 0.602 |
|
| 0.001 | 0.962 | 0.133 | 0.001 | 0.930 | 0.132 | 0.001 | 0.934 | 0.127 |
|
| 0.001 | 0.964 | 0.134 | 0.001 | 0.934 | 0.142 | 0.001 | 0.972 | 0.142 |
Estimates of selected variables on the direct impact of COVID‐19 in Spain from integrated data using a new estimation method based on calibration and XGBoost PSA () and direct calibration of the integrated sample ().
| Individual samples | Integrated sample | |||||||
|---|---|---|---|---|---|---|---|---|
| Probability | Nonprobability |
|
| |||||
| Variable | Estimation | CI | Estimation | CI | Estimation | CI | Estimation | CI |
|
| 0.127 | 0.106–0.148 | 0.095 | 0.073–0.117 | 0.112 | 0.090–0.134 | 0.122 | 0.102–0.142 |
|
| 0.306 | 0.277–0.336 | 0.291 | 0.256–0.325 | 0.299 | 0.265–0.333 | 0.285 | 0.258–0.313 |
|
| 0.147 | 0.076–0.219 | 0.146 | 0.045–0.247 | 0.188 | 0.123–0.253 | 0.186 | 0.113–0.259 |
|
| 0.716 | 0.625–0.807 | 0.688 | 0.555–0.820 | 0.650 | 0.567–0.734 | 0.660 | 0.484–0.837 |
|
| 0.116 | 0.051–0.180 | 0.104 | 0.017–0.191 | 0.111 | 0.063–0.159 | 0.104 | 0.019–0.189 |
|
| 0.021 | 0.000–0.050 | 0.042 | 0.000–0.099 | 0.012 | 0.000–0.031 | 0.037 | 0.000–0.088 |
Estimates of selected variables on indirect effects of COVID‐19 in Spain from integrated data using a new estimation method based on calibration and XGBoost PSA () and direct calibration of the integrated sample ().
| Individual samples | Integrated sample | |||||||
|---|---|---|---|---|---|---|---|---|
| Probability | Nonprobability |
|
| |||||
| Variable | Estimation | CI | Estimation | CI | Estimation | CI | Estimation | CI |
|
| 0.067 | 0.051–0.083 | 0.076 | 0.056–0.096 | 0.064 | 0.046–0.082 | 0.075 | 0.058–0.092 |
|
| 0.277 | 0.248–0.305 | 0.235 | 0.203–0.267 | 0.265 | 0.228–0.303 | 0.261 | 0.234–0.288 |
|
| 0.425 | 0.393–0.456 | 0.305 | 0.269–0.340 | 0.398 | 0.365–0.431 | 0.403 | 0.374–0.431 |
FIGURE 1Estimation of the selected variables including confidence intervals