| Literature DB >> 30294055 |
Mark E McGovern1,2,3, David Canning4, Till Bärnighausen3,4,5.
Abstract
Standard corrections for missing data rely on the strong and generally untestable assumption of missing at random. Heckman-type selection models relax this assumption, but have been criticized because they typically require a selection variable which predicts non-response but not the outcome of interest, and can impose bivariate normality. In this paper we illustrate an application using a copula methodology which does not rely on bivariate normality. We implement this approach in data on HIV testing at a demographic surveillance site in rural South Africa which are affected by non-response. Randomized incentives are the ideal selection variable, particularly when implemented ex ante to deal with potential missing data. However, elements of survey design may also provide a credible method of correcting for non-response bias ex post. For example, although not explicitly randomized, allocation of food gift vouchers during our survey was plausibly exogenous and substantially raised participation, as did effective survey interviewers. Based on models with receipt of a voucher and interviewer identity as selection variables, our results imply that 37% of women in the population under study are HIV positive, compared to imputation-based estimates of 28%. For men, confidence intervals are too wide to reject the absence of non-response bias. Consistent results obtained when comparing different selection variables and error structures strengthen these conclusions. Our application illustrates the feasibility of the selection model approach when combined with survey metadata.Entities:
Keywords: HIV; Non-ignorable missing data; Participation incentives; Selection bias; Selection models; Survey design
Year: 2018 PMID: 30294055 PMCID: PMC6167756 DOI: 10.1016/j.econlet.2018.07.040
Source DB: PubMed Journal: Econ Lett ISSN: 0165-1765
Participation in HIV Testing and HIV Prevalence at the 2010 AHRI Surveillance Cohort.
| Women | Men | ||||
|---|---|---|---|---|---|
| No. | % | No. | % | ||
| Refused to Test | 9,357 | 55 | 7,210 | 67 | |
| Consented to Test | 7,590 | 45 | 3,527 | 33 | |
| Total | 16,947 | 100 | 10,737 | 100 | |
| Women | Men | ||||
| % | % | ||||
| Did not receive gift voucher — consented to test | 42 | 31 | |||
| Received gift voucher — consented to test | 58 | 41 | |||
| Women | Men | ||||
| HIV Prevalence (%) | 27 | 16 | |||
| 95% CI for Nonparametric Bounds | 12 | 68 | 5 | 73 | |
Note: HIV prevalence estimates are based on those who participated in testing. Confidence intervals for nonparametric bounds are based on Horowitz and Manski 2006.
Fig. 1AHRI 2010 HIV Prevalence by Gift Voucher Receipt. Note: For each group (received a gift voucher or did not receive a gift voucher), the HIV prevalence rate is calculated as the number of HIV positive respondents among those who participated in testing in that group divided by the number of respondents who participated in testing in that group. 95% confidence intervals are shown.
Results for HIV Prevalence (Women).
| Model | Selection Variable | HIV Prevalence | 95% CI | Gamma Association | Copula | |
|---|---|---|---|---|---|---|
| Complete Case | 27 | 26–28 | ||||
| Imputation | 28 | 28–29 | ||||
| Normal Selection Model | Interviewers | 35 | 31–39 | −0.42 | Gaussian | |
| Normal Selection Model | Gift voucher | 33 | 23–42 | −0.26 | Gaussian | |
| Normal Selection Model | Interviewers + Gift voucher | 35 | 31–39 | −0.40 | Gaussian | |
| Copula Selection Model | Interviewers | 37 | 33–41 | −0.49 | Frank | |
| Copula Selection Model | Gift voucher | 39 | 31–47 | −0.54 | Frank | |
| Copula Selection Model | Interviewers + Gift voucher | 37 | 33–41 | −0.46 | Frank | |
Note: The following variables are included as predictors of consent to test for HIV and HIV status: age, month of interview, location of residence, urban/rural/peri-urban type of residence, distance to nearest clinic, distance to nearest secondary school, distance to nearest primary school, distance to nearest level 1 road, distance to nearest level 2 road, marital status, education, mother/father alive, electricity in home, fuel in home, toilet in home, water in home, and household asset index. The first row is the mean prevalence among the sample who consent to test and have a valid HIV test (complete case analysis). The second row imputes HIV prevalence for those who refused to test using the covariates described above. Row 3 implements a Heckman selection model for HIV status and consent to an HIV test using interviewer fixed effects. In row 4 the Heckman selection model uses a binary indicator for whether the respondent received the food gift voucher. The model in row 5 uses interviewers and the gift voucher intervention as exclusion restriction variables. 94 respondents who consented to test for HIV, but received indeterminate results were excluded from the procedure for estimating HIV prevalence. The copula model shown is the model with the best fit, as defined by the Akaike Information Criterion (AIC). Tables show the gamma rank association measure in column 5.
Results for HIV Prevalence (Men).
| Model | Selection Variable | HIV Prevalence | 95% CI | Gamma Association | Copula | |
|---|---|---|---|---|---|---|
| Complete Case | 16 | 14–17 | ||||
| Imputation | 17 | 17–18 | ||||
| Normal Selection Model | Interviewers | 20 | 14–26 | −0.22 | Gaussian | |
| Normal Selection Model | Gift voucher | 28 | 12–43 | −0.55 | Gaussian | |
| Normal Selection Model | Interviewers + Gift voucher | 21 | 16–27 | −0.29 | Gaussian | |
| Copula Selection Model | Interviewers | 21 | 16–25 | −0.20 | Joe 90 | |
| Copula Selection Model | Gift voucher | 33 | 19–46 | −0.68 | Frank | |
| Copula Selection Model | Interviewers + Gift voucher | 21 | 17–25 | −0.24 | Joe 90 | |
Note: The following variables are included as predictors of consent to test for HIV and HIV status: age, month of interview, location of residence, urban/rural/peri-urban type of residence, distance to nearest clinic, distance to nearest secondary school, distance to nearest primary school, distance to nearest level 1 road, distance to nearest level 2 road, marital status, education, mother/father alive, electricity in home, fuel in home, toilet in home, water in home, and household asset index. The first row is the mean prevalence among the sample who consent to test and have a valid HIV test (complete case analysis). The second row imputes HIV prevalence for those who refused to test using the covariates described above. Row 3 implements a Heckman selection model for HIV status and consent to an HIV test using interviewer fixed effects. In row 4 the Heckman selection model uses a binary indicator for whether the respondent received the food gift voucher. The model in row 5 uses interviewers and the gift voucher intervention as exclusion restriction variables. 94 respondents who consented to test for HIV, but received indeterminate results were excluded from the procedure for estimating HIV prevalence. The copula model shown is the model with the best fit, as defined by the Akaike Information Criterion (AIC). Tables show the gamma rank association measure in column 5.