| Literature DB >> 35493336 |
Sebastian Kocar1, Nicholas Biddle2.
Abstract
The objective of this study is to identify factors affecting participation rates, i.e., nonresponse and voluntary attrition rates, and their predictive power in a probability-based online panel. Participation for this panel had already been investigated in the literature according to the socio-demographic and socio-psychological characteristics of respondents and different types of paradata, such as device type or questionnaire navigation, had also been explored. In this study, the predictive power of online panel participation paradata was instead evaluated, which was expected (at least in theory) to offer even more complex insight into respondents' behavior over time. This kind of paradata would also enable the derivation of longitudinal variables measuring respondents' panel activity, such as survey outcome rates and consecutive waves with a particular survey outcome prior to a wave (e.g., response, noncontact, refusal), and could also be used in models controlling for unobserved heterogeneity. Using the Life in Australia™ participation data for all recruited members for the first 30 waves, multiple linear, binary logistic and panel random-effect logit regression analyses were carried out to assess socio-demographic and online panel paradata predictors of nonresponse and attrition that were available and contributed to the accuracy of prediction and the best statistical modeling. The proposed approach with the derived paradata predictors and random-effect logistic regression proved to be reasonably accurate for predicting nonresponse-with just 15 waves of online panel paradata (even without sociodemographics) and logit random-effect modeling almost four out of five nonrespondents could be correctly identified in the subsequent wave.Entities:
Keywords: Online panel paradata; Panel voluntary attrition; Prediction modeling; Random-effect logit model; Unit nonresponse
Year: 2022 PMID: 35493336 PMCID: PMC9036512 DOI: 10.1007/s11135-022-01385-x
Source DB: PubMed Journal: Qual Quant ISSN: 0033-5177
Statistical models used in this study (by research question)
| Research question | Model | Outcome variable | Predictors |
|---|---|---|---|
| RQ1 | Multiple linear regression model | Individual survey completion ratea | Socio-demographicsc |
| Binary logistic regression model | Individual voluntary attrition at any point in timeb | ||
| RQ2 | Binary (pooled) logistic regression model | Voluntary attrition in a particular waved | Online panel paradata variables (with and without socio-demographicsc) |
| Binary (pooled) logistic regression model | Nonresponse in a particular wavee | ||
| RQ3 | Binary (pooled) logistic regression models | Nonresponse in a particular wavee | Online panel paradata variables and socio-demographicsc |
| Random-effect logit models | Nonresponse in a particular wavee | ||
| RQ4, RQ5 | Random-effect logit models | Nonresponse in a particular wavee | Online panel paradata variables and socio-demographicsc |
Fixed-effect models are added as a sensitivity analysis; see Tables 5 and 6 in the Appendix.
aCalculated as: (number of all completed questionnaires / all panel waves invited to)
bA binary variable with values: 1 = opted-out in the first 30 waves, 0 = still a panel member after 30 waves
cGender, education, capital city in state, born in Australia, only English spoken at home, indigenous status, other healthcare card, carer status, population (online, offline), age group, Socio-Economic Indexes for Areas (we performed multiple imputations for missing socio-demographic data in Stata)
dA binary variable with values: 1 = opted-out in waven, 0 = remained in the panel after waven
eA binary variable with values: 1 = nonresponse in waven, 0 = survey completion in waven
Logit regression, random-effect and fixed-effect within-person logistic regression results, the effect of previous response trends on nonresponse in certain wave, 2990 persons, waves 1–30
| Logit regression model (a pooled model) | Random-effect within-person logistic regression model | Fixed-effect within-person logistic regression model | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Coef | L 95% CI | U 95% CI | p value | Coef | L 95% CI | U 95% CI | p value | Coef | L 95% CI | U 95% CI | p value | |
| Participation rate | − 2.73 | − 3.40 | − 2.06 | < 0.001** | − 3.34 | − 4.13 | − 2.55 | < 0.001** | − 4.98 | − 5.85 | − 4.10 | < 0.001** |
| Non-contact rate | 0.49 | − 0.17 | 1.15 | 0.145 | − 0.18 | − 0.99 | 0.62 | 0.657 | − 3.84 | − 4.75 | − 2.92 | < 0.001** |
| Refusal rate | 0.49 | − 0.37 | 1.34 | 0.264 | − 0.15 | − 1.23 | 0.92 | 0.781 | − 3.10 | − 4.36 | − 1.85 | < 0.001** |
| Non-refusal rate | 1.05 | 0.37 | 1.73 | 0.003** | − 0.11 | − 0.95 | 0.74 | 0.807 | − 4.45 | − 5.41 | − 3.50 | < 0.001** |
| Charity rate | 0.40 | 0.30 | 0.49 | < 0.001** | 0.68 | 0.53 | 0.83 | < 0.001** | 0.50 | 0.19 | 0.80 | 0.002** |
| Consecutive participation | 0.03 | 0.01 | 0.05 | < 0.001** | 0.04 | 0.02 | 0.05 | < 0.001** | 0.00 | − 0.02 | 0.02 | 0.962 |
| Consecutive response | − 0.12 | − 0.13 | − 0.11 | < 0.001** | − 0.09 | − 0.10 | − 0.07 | < 0.001** | 0.02 | 0.01 | 0.04 | 0.001** |
| Consecutive non-contact | 0.55 | 0.52 | 0.58 | < 0.001** | 0.45 | 0.41 | 0.48 | < 0.001** | 0.49 | 0.45 | 0.52 | < 0.001** |
| Consecutive refusal | 1.10 | 0.87 | 1.32 | < 0.001** | 0.98 | 0.74 | 1.23 | < 0.001** | 1.02 | 0.77 | 1.27 | < 0.001** |
| Consecutive non-refusal | 0.33 | 0.27 | 0.39 | < 0.001** | 0.21 | 0.15 | 0.28 | < 0.001** | 0.32 | 0.26 | 0.39 | < 0.001** |
| Consecutive charity donations | 0.01 | 0.00 | 0.03 | 0.024* | 0.02 | 0.00 | 0.03 | 0.015* | 0.03 | 0.01 | 0.05 | < 0.001** |
| Change from interview to other | 0.09 | 0.01 | 0.17 | 0.028* | 0.15 | 0.06 | 0.23 | 0.001** | 0.24 | 0.15 | 0.32 | < 0.001** |
| Change from other to refusal | 0.51 | 0.20 | 0.82 | 0.001** | 0.45 | 0.12 | 0.78 | 0.008** | 0.43 | 0.09 | 0.76 | 0.012* |
| Constant | 1.27 | 0.65 | 1.90 | < 0.001** | ||||||||
| Pseudo R-squared | 0.415 | |||||||||||
Pooled logit regression and random-effect models include the following controls: gender, age, education, capital, born in Australia, only English spoken at home, Indigenous status, another health card, carer status, online/offline population and SEIFA
Coef model regression coefficient, L 95% CI lower limit of the 95% confidence interval, U 95% CI upper limit of 95% confidence interval
*Significant at the 0.05 level
**Significant at the 0.01 level
Logit regression, random-effect and fixed-effect within-person logistic regression results, online and offline samples, the effect of previous response trends on voluntary panel attrition in certain wave, 2990 persons, waves 1–30
| Logit regression model (a pooled model) | Random-effect within-person logistic regression model | Fixed-effect within-person logistic regression model | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Coef | L 95% CI | U 95% CI | p value | Coef | L 95% CI | U 95% CI | p value | Coef | L 95% CI | U 95% CI | p value | |
| Participation rate | − 2.37 | − 4.94 | 0.21 | 0.072 | − 2.60 | − 5.26 | 0.06 | 0.056 | − 19.87 | − 24.57 | − 15.17 | < 0.001** |
| Non-contact rate | − 1.32 | − 3.84 | 1.21 | 0.307 | − 1.45 | − 4.03 | 1.12 | 0.269 | − 14.27 | − 18.95 | − 9.60 | < 0.001** |
| Refusal rate | 1.14 | − 1.39 | 3.67 | 0.377 | 1.17 | − 1.40 | 3.75 | 0.371 | − 9.62 | − 15.30 | − 3.93 | 0.001** |
| Non-refusal rate | − 0.32 | − 2.88 | 2.23 | 0.805 | − 0.46 | − 3.07 | 2.14 | 0.728 | − 15.65 | − 20.49 | − 10.80 | < 0.001** |
| Charity rate | 0.91 | 0.50 | 1.31 | < 0.001** | 0.94 | 0.51 | 1.36 | < 0.001** | − 0.97 | − 2.84 | 0.90 | 0.311 |
| Consecutive participation | − 0.01 | − 0.09 | 0.06 | 0.714 | − 0.01 | − 0.09 | 0.07 | 0.843 | 0.35 | 0.22 | 0.47 | < 0.001** |
| Consecutive response | − 0.14 | − 0.20 | − 0.08 | < 0.001** | − 0.14 | − 0.20 | − 0.08 | < 0.001** | − 0.12 | − 0.21 | − 0.02 | 0.021* |
| Consecutive non-contact | 0.05 | 0.00 | 0.09 | 0.059 | 0.05 | 0.00 | 0.10 | 0.052 | 0.66 | 0.50 | 0.81 | < 0.001** |
| Consecutive refusal | 0.81 | 0.60 | 1.03 | < 0.001** | 0.84 | 0.61 | 1.08 | < 0.001** | 1.03 | 0.65 | 1.41 | < 0.001** |
| Consecutive non-refusal | − 0.01 | − 0.22 | 0.19 | 0.905 | − 0.01 | − 0.22 | 0.21 | 0.956 | 0.49 | 0.19 | 0.79 | 0.001** |
| Consecutive charity donations | 0.06 | 0.00 | 0.12 | 0.044* | 0.06 | 0.00 | 0.12 | 0.046* | 0.20 | 0.10 | 0.30 | < 0.001** |
| Change from interview to other | 0.14 | − 0.24 | 0.53 | 0.462 | 0.16 | − 0.23 | 0.55 | 0.416 | 0.88 | 0.40 | 1.35 | < 0.001** |
| Change from other to refusal | 1.07 | 0.67 | 1.46 | < 0.001** | 1.03 | 0.61 | 1.44 | < 0.001** | 0.88 | 0.36 | 1.40 | 0.001** |
| Constant | − 4.64 | − 7.06 | − 2.22 | < 0.001** | − 4.65 | − 7.09 | − 2.21 | < 0.001** | ||||
| Pseudo R-squared | 0.147 | |||||||||||
Pooled logit regression model includes the following controls: gender, age, education, capital, born in Australia, only English spoken at home, Indigenous status, another health card, carer status, online/offline population and SEIFA
Coef model regression coefficient, L 95% CI lower limit of the 95% confidence interval, U 95% CI upper limit of 95% confidence interval
*Significant at the 0.05 level
**Significant at the 0.01 level
Derived variables as exogenous covariates/predictors of panel participation
| Predictor | Calculation |
|---|---|
| Participation rate | |
| Non-contact rate | |
| Refusal rate | |
| Non-refusal rate | |
| Charity rate | |
| Consecutive participation | Consecutive waves prior to waven with completed questionnaires (invited or not) |
| Consecutive response | Consecutive waves prior to waven with completed questionnaires (waves invited to only) |
| Consecutive non-contact | Consecutive waves prior to waven with noncontact survey outcome (waves invited to only) |
| Consecutive refusal | Consecutive waves prior to waven with refusal survey outcome (waves invited to only) |
| Consecutive non-refusal | Consecutive waves prior to waven with non-refusal survey outcome (waves invited to only) |
| Consecutive charity donations | Consecutive waves prior to waven with donations to charities (waves with completed questionnaires only) |
| Change from interview to other | Interview survey outcome in waven-2 and nonresponse (non-contact, refusal, or non-refusal) in waven-1 (waves invited to only) |
| Change from other to refusal | Interview, non-contact, or non-refusal survey outcome in waven-2 and refusal in waven-1 (waves invited to only) |
aCharity rate is a special type of rate and is not one of standard survey outcome rates. Yet, it is associated with motivation to participate in online panel surveys and could be treated as a type of panel behavior measured with online panel paradata. The same can be said for consecutive charity donations
Survey response percentage and attritor sample statistics (n = 2990)
| n | Survey response % | Voluntary attritor (in any wave, in %) | |||
|---|---|---|---|---|---|
| Mean | SD | No | Yes | ||
| Gender | |||||
| Female | 1576 | 76.60 | 29.52 | 86.42 | 13.58 |
| Male | 1403 | 74.55 | 31.12 | 86.60 | 13.40 |
| Education | |||||
| Bachelor or higher | 1127 | 78.86 | 28.91 | 88.11 | 11.89 |
| Certificate/diploma/trade | 1062 | 73.59 | 30.83 | 87.01 | 12.99 |
| Year 12 or equivalent | 343 | 72.65 | 31.41 | 88.63 | 11.37 |
| Year 11 or less | 458 | 74.33 | 30.86 | 79.69 | 20.31 |
| Capital city in state | |||||
| No | 999 | 76.93 | 29.34 | 86.79 | 13.21 |
| Yes | 1966 | 75.59 | 30.21 | 86.52 | 13.48 |
| Born in Australia | |||||
| No | 820 | 72.75 | 31.81 | 86.10 | 13.90 |
| Yes | 2160 | 76.72 | 29.62 | 86.71 | 13.29 |
| Only English spoken at home | |||||
| No | 442 | 65.69 | 35.11 | 89.14 | 10.86 |
| Yes | 2547 | 77.32 | 29.03 | 86.06 | 13.94 |
| Indigenous status | |||||
| No | 2921 | 75.74 | 30.18 | 86.41 | 13.59 |
| Yes | 64 | 69.00 | 34.61 | 92.19 | 7.81 |
| Other healthcare card | |||||
| No | 1965 | 74.72 | 30.61 | 87.33 | 12.67 |
| Yes | 992 | 78.11 | 29.05 | 85.08 | 14.92 |
| Carer status | |||||
| No | 2400 | 74.37 | 30.97 | 85.58 | 14.42 |
| Yes | 582 | 81.11 | 26.32 | 90.38 | 9.62 |
| Populationa | |||||
| Offline | 433 | 72.53 | 28.45 | 77.60 | 22.40 |
| Online | 2557 | 76.10 | 30.56 | 87.99 | 12.01 |
| Age group | |||||
| 18–24 years | 239 | 58.44 | 35.33 | 93.31 | 6.69 |
| 25–34 years | 403 | 67.75 | 33.61 | 91.07 | 8.93 |
| 35–44 years | 418 | 71.10 | 32.76 | 89.71 | 10.29 |
| 45–54 years | 518 | 75.56 | 29.42 | 87.07 | 12.93 |
| 55–64 years | 636 | 79.62 | 28.53 | 85.53 | 14.47 |
| 65–74 years | 532 | 84.83 | 23.35 | 82.71 | 17.29 |
| 75 or more years | 237 | 82.20 | 22.28 | 75.95 | 24.05 |
| Socio-economic indexes for areas | |||||
| Quartile 1 | 417 | 76.69 | 30.08 | 88.73 | 11.27 |
| Quartile 2 | 520 | 76.76 | 29.88 | 85.00 | 15.00 |
| Quartile 3 | 570 | 76.55 | 29.58 | 88.95 | 11.05 |
| Quartile 4 | 635 | 75.28 | 30.67 | 84.88 | 15.12 |
| Quartile 5 | 822 | 75.50 | 29.59 | 86.25 | 13.75 |
| Age, mean with SD | 2949 | 50.31 (17.18) | 56.89 (16.62) | ||
| Survey response %, mean with SD | 2990 | 78.65 (29.61) | 55.93 (27.07) | ||
aAt profile survey (before first wave for panellist)
Multiple linear regression (survey completion rates) and logistic regression (voluntary attrition) results, socio-demographic predictors, waves 1–30, 2872 persons
| Survey completion rate | Voluntary attrition | |||
|---|---|---|---|---|
| Coef. | p value | Coef. | p value | |
| Gender | ||||
| Female | 0 | 0 | ||
| Male | − 1.70 | 0.113 | − 0.03 | 0.811 |
| Education | ||||
| Bachelor or higher | 0 | 0 | ||
| Certificate/diploma/trade | − 6.96 | < 0.001** | 0.14 | 0.310 |
| Year 12 or equivalent | − 3.86 | 0.036* | − 0.05 | 0.814 |
| Year 11 or less | − 9.29 | < 0.001** | 0.37 | 0.029* |
| Capital city in state | ||||
| No | 0 | 0 | ||
| Yes | 1.43 | 0.263 | 0.05 | 0.718 |
| Born in Australia | ||||
| No | 0 | 0 | ||
| Yes | 1.95 | 0.141 | − 0.08 | 0.571 |
| Only English spoken at home | ||||
| No | 0 | 0 | ||
| Yes | 6.32 | < 0.001** | 0.19 | 0.341 |
| Indigenous status | ||||
| No | 0 | 0 | ||
| Yes | − 3.38 | 0.362 | − 0.44 | 0.358 |
| Other healthcare card | ||||
| No | 0 | 0 | ||
| Yes | − 0.36 | 0.779 | − 0.28 | 0.041* |
| Carer status | ||||
| No | 0 | 0 | ||
| Yes | 4.27 | 0.002** | − 0.59 | < 0.001** |
| Population | ||||
| Offline | 0 | 0 | ||
| Online | 8.96 | < 0.001** | − 0.57 | < 0.001** |
| SEIFA | ||||
| Quartile 1 | 0.12 | 0.947 | 0.06 | 0.791 |
| Quartile 2 | − 1.15 | 0.513 | 0.30 | 0.110 |
| Quartile 3 | 0.00 | 0 | ||
| Quartile 4 | − 1.62 | 0.337 | 0.43 | 0.019* |
| Quartile 5 | − 3.83 | 0.023* | 0.32 | 0.086 |
| Age | 0.45 | < 0.001** | 0.02 | < 0.001** |
| Constant | 43.99 | < 0.001** | − 2.84 | < 0.001** |
| Adjusted R-squared | 0.085 | |||
| Pseudo R-squared | 0.044 | |||
Coef model regression coefficient
* Significant at the 0.05 level
** Significant at the 0.01 level
Fig. 1Predictive power for response and nonresponse combined, paradata prediction with and without socio-demographics, waves 4–30 (Accuracy)
Fig. 2Predictive power for nonresponse, paradata prediction with and without socio-demographics, waves 4–30 (Recall)
Fig. 3Predictive power for nonresponse, pooled logit and random-effect logit regressions, waves 4–30 (Accuracy)
Fig. 4Predictive power for response and nonresponse combined, pooled logit and random-effect logit regressions, waves 4–30 (Recall)
Fig. 5The relationship between recall and precision, “cost–benefit” analysis (wave 16, n = 2727)