| Literature DB >> 31664912 |
Lisa Avery1,2, Nooshin Rotondi3,4, Constance McKnight5, Michelle Firestone3, Janet Smylie3, Michael Rotondi6.
Abstract
BACKGROUND: It is unclear whether weighted or unweighted regression is preferred in the analysis of data derived from respondent driven sampling. Our objective was to evaluate the validity of various regression models, with and without weights and with various controls for clustering in the estimation of the risk of group membership from data collected using respondent-driven sampling (RDS).Entities:
Mesh:
Year: 2019 PMID: 31664912 PMCID: PMC6819607 DOI: 10.1186/s12874-019-0842-5
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Fig. 1Illustration of study workflow
Fig. 2Simulated RDS Sample from a population with homophily of 1.5 and population prevalence of 0 10%. Red dots indicate the seeds and blue dots are members of Group 1
Summary of regression model performance across all populations
| Model | Weight | Clusters | Ψ | SE Adj. | Error | Coverage | Bias (mean %) | Bias (median %) | Accuracy (%) | |
|---|---|---|---|---|---|---|---|---|---|---|
| Logistic Regression | ||||||||||
| Generalised Linear Models | ||||||||||
| glm(R) | 1 | – | 0.04 | 0.954 | 2.07 | −1.63 | 88.1 | |||
| 2 | RDS-II | 0.55 | 0.442 | 20.89 | 8.51 | |||||
| 3 | – | R-y | 0.04 | 0.955 | 3.35 | −0.48 | 88.6 | |||
| 4 | RDS-II | R-y | 0.55 | 0.443 | 25.56 | 11.57 | ||||
| surveylogistic (SAS) | 5 | – | 0.05 | 0.952 | 2.07 | −1.63 | 88.1 | |||
| 6 | RDS-II | 0.07 | 0.903 | 20.88 | 8.51 | |||||
| 7 | – | Morel | 0.05 | 0.953 | 2.07 | −1.63 | 88.1 | |||
| 8 | RDS-II | Morel | 0.07 | 0.904 | 20.88 | 8.51 | ||||
| 9 | RDS-II | RwS | 0.07 | 0.903 | 20.88 | 8.51 | ||||
| 10 | RDS-II | RwS | Morel | 0.07 | 0.904 | 20.88 | 8.51 | |||
| Generalised Linear Mixed Models | ||||||||||
| glmer(R) | 11 | – | S | U | 0.05 | 0.954 | 3.48 | −0.46 | 88.1 | |
| 12 | RDS-II | S | U | 0.55 | 0.402 | 44.55 | 26.73 | |||
| glimmix (SAS) | 13 | – | S | AR | 0.04 | 0.955 | 3.45 | −0.34 | 88.1 | |
| glimmix (SAS) | 14 | – | R | CS | 0.04 | 0.957 | 2.4 | −1.19 | 88.1 | |
| glmmPQL(R) | 15 | – | S | DC | 0.04 | 0.865 | −0.86 | −6.34 | ||
| Generalised Estimating Equations | ||||||||||
| geeglm(R) | 16 | – | R | I | Classical | 0.13 | 0.952 | 2.07 | −1.63 | |
| 17 | RDS-II | R | I | Classical | 0.16 | 0.902 | 20.89 | 8.51 | ||
| glimmix (SAS) | 18 | – | S | AR | 0.04 | 0.939 | 1.85 | −1.69 | ||
| 19 | – | R | CS | 0.04 | 0.937 | 2.52 | −1.75 | |||
| 20 | – | R | CS | Classical | 0.05 | 0.948 | 2.52 | −1.75 | ||
| 21 | – | R | CS | FIRORES | 0.05 | 0.950 | 2.52 | −1.75 | 88.1 | |
| 22 | – | R | CS | FIROEEQ | 0.05 | 0.951 | 2.52 | −1.75 | 88.1 | |
| 23 | – | R | CS | MBN | 0.05 | 0.950 | 2.52 | −1.75 | ||
| Poisson Regression | ||||||||||
| Generalised Linear Models | ||||||||||
| glm(R) | 24 | – | 0.02 | 0.962 | 4.81 | 4.15 | 86 | |||
| glm(R) | 25 | RDS-II | 0.49 | 0.457 | 9.48 | 8.23 | ||||
| glm(R) | 26 | – | R-y | 0.02 | 0.964 | 3.06 | 2.44 | 86.3 | ||
| glm(R) | 27 | RDS-II | R-y | 0.47 | 0.493 | 7.74 | 6.46 | |||
| Generalised Linear Mixed Models | ||||||||||
| glmer(R) | 28 | – | S | U | 0.02 | 0.963 | 4.92 | 4.27 | 86 | |
| 29 | RDS-II | S | U | 0.47 | 0.431 | 11.71 | 10.42 | |||
| Generalised Estimating Equations | ||||||||||
| geeglm(R) | 30 | – | R | I | Classical | 0.13 | 0.859 | 4.81 | 4.15 | |
| 31 | RDS-II | R | I | Classical | 0.17 | 0.781 | 9.48 | 8.23 | ||
R-y recruiter outcome as covariate, S Seeds, R recruiter, RwS recruiter within seed
Population and mean sample characteristics for each simulated population
| Population | Population characteristics | Mean sample characteristics | Sampling correlationa | |||||
|---|---|---|---|---|---|---|---|---|
| Prevalence | Homophily | Odds ratio | Relative risk | Degree | Number of waves | Recruits per seed | ||
| 1 | 10% | 1.00 | 7.59 | 2.86 | 44.4 | 8.4 | 57.5 | 0.899 |
| 2 | 10% | 1.10 | 7.65 | 2.88 | 43.5 | 8.3 | 57.2 | 0.895 |
| 3 | 10% | 1.25 | 7.22 | 2.84 | 44.2 | 8.4 | 57.0 | 0.900 |
| 4 | 10% | 1.50 | 6.93 | 2.85 | 43.7 | 8.3 | 56.9 | 0.896 |
| 5 | 30% | 1.00 | 7.47 | 2.05 | 43.8 | 8.1 | 55.9 | 0.896 |
| 6 | 30% | 1.10 | 7.56 | 2.05 | 43.4 | 8.1 | 55.6 | 0.891 |
| 7 | 30% | 1.25 | 7.47 | 2.05 | 44.4 | 8.2 | 55.9 | 0.894 |
| 8 | 30% | 1.50 | 7.59 | 2.06 | 44.2 | 8.2 | 56.3 | 0.894 |
| 9 | 50% | 1.00 | 7.47 | 1.68 | 43.6 | 8.2 | 55.6 | 0.890 |
| 10 | 50% | 1.10 | 7.55 | 1.68 | 43.5 | 8.1 | 55.6 | 0.890 |
| 11 | 50% | 1.25 | 7.50 | 1.69 | 44.2 | 8.2 | 55.3 | 0.892 |
| 12 | 50% | 1.50 | 7.51 | 1.69 | 44.0 | 8.2 | 55.9 | 0.893 |
aCorrelation between network degree and sampling frequency
Fig. 3Prediction accuracy of the unweighted Binomial (model 1) and Poisson (model 24) for the populations with homophily of 1
Outcome prevalence estimates using various estimators across populations
| Homophily: | Outcome prevalence 10% | Outcome prevalence 30% | Outcome prevalence 50% | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.00 | 1.10 | 1.25 | 1.50 | 1.00 | 1.10 | 1.25 | 1.50 | 1.00 | 1.10 | 1.25 | 1.50 | |
| Mean outcome prevalence | ||||||||||||
| naïve | 0.09 | 0.09 | 0.09 | 0.09 | 0.27 | 0.27 | 0.27 | 0.27 | 0.47 | 0.47 | 0.47 | 0.46 |
| RDS-I | 0.08 | 0.08 | 0.08 | 0.08 | 0.27 | 0.26 | 0.26 | 0.26 | 0.47 | 0.47 | 0.46 | 0.46 |
| RDS-II | 0.08 | 0.08 | 0.08 | 0.08 | 0.27 | 0.26 | 0.26 | 0.26 | 0.47 | 0.47 | 0.46 | 0.46 |
| surveylogistic models | ||||||||||||
| unweighted | 0.09 | 0.09 | 0.09 | 0.09 | 0.27 | 0.27 | 0.27 | 0.27 | 0.47 | 0.47 | 0.47 | 0.46 |
| weighted (RDS-II) | 0.08 | 0.08 | 0.08 | 0.08 | 0.27 | 0.26 | 0.26 | 0.26 | 0.47 | 0.46 | 0.46 | 0.45 |
| Mean SD of outcome prevalence | ||||||||||||
| naive | 0.01 | 0.01 | 0.01 | 0.02 | 0.02 | 0.02 | 0.02 | 0.03 | 0.02 | 0.02 | 0.03 | 0.03 |
| RDS-I | 0.02 | 0.02 | 0.02 | 0.03 | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.05 | 0.05 | 0.05 |
| RDS-II | 0.02 | 0.02 | 0.02 | 0.03 | 0.04 | 0.04 | 0.04 | 0.05 | 0.04 | 0.05 | 0.05 | 0.05 |
| surveylogistic models | ||||||||||||
| unweighted | 0.01 | 0.01 | 0.01 | 0.02 | 0.02 | 0.02 | 0.02 | 0.03 | 0.02 | 0.02 | 0.03 | 0.03 |
| weighted (RDS-II) | 0.02 | 0.02 | 0.02 | 0.03 | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.05 | 0.05 | 0.05 |
| Estimator coverage rates | ||||||||||||
| naive | 0.845 | 0.827 | 0.802 | 0.708 | 0.646 | 0.740 | 0.620 | 0.642 | 0.742 | 0.687 | 0.634 | 0.551 |
| RDS-I | 0.545 | 0.554 | 0.548 | 0.578 | 0.572 | 0.512 | 0.524 | 0.501 | 0.627 | 0.610 | 0.569 | 0.511 |
| RDS-II | 0.772 | 0.776 | 0.766 | 0.749 | 0.799 | 0.761 | 0.744 | 0.723 | 0.839 | 0.831 | 0.791 | 0.741 |
| surveylogistic models | ||||||||||||
| unweighted | 0.916 | 0.900 | 0.875 | 0.784 | 0.657 | 0.745 | 0.611 | 0.645 | 0.747 | 0.684 | 0.644 | 0.544 |
| weighted (RDS-II) | 0.828 | 0.819 | 0.799 | 0.769 | 0.825 | 0.779 | 0.778 | 0.753 | 0.862 | 0.835 | 0.819 | 0.756 |
Type I error rate of unweighted and weighted regression models for populations with correlation between outcome and network degree
| Secondary analysis population | Binomial regression | Poisson regression | |||
|---|---|---|---|---|---|
| Correlation of degree and outcome | Unweighted | Weighted | Unweighted | Weighted | |
| Population homophily = 1.25 | |||||
| 1 | extreme negative ( | 0.043 | 0.548 | 0.037 | 0.455 |
| 2 | extreme positive ( | 0.048 | 0.003 | 0.037 | 0.003 |
| 3 | moderate negative ( | 0.062 | 0.498 | 0.049 | 0.445 |
| 4 | moderate positive ( | 0.046 | 0.241 | 0.032 | 0.229 |
| Population homophily = 1.50 | |||||
| 5 | extreme negative ( | 0.037 | 0.529 | 0.029 | 0.412 |
| 6 | extreme positive ( | 0.054 | 0.006 | 0.043 | 0.006 |
| 7 | moderate negative ( | 0.037 | 0.459 | 0.025 | 0.418 |
| 8 | moderate positive ( | 0.024 | 0.186 | 0.020 | 0.175 |