| Literature DB >> 31479445 |
Jacques Muthusi1, Samuel Mwalili1, Peter Young1.
Abstract
INTRODUCTION: Reproducible research is increasingly gaining interest in the research community. Automating the production of research manuscript tables from statistical software can help increase the reproducibility of findings. Logistic regression is used in studying disease prevalence and associated factors in epidemiological studies and can be easily performed using widely available software including SAS, SUDAAN, Stata or R. However, output from these software must be processed further to make it readily presentable. There exists a number of procedures developed to organize regression output, though many of them suffer limitations of flexibility, complexity, lack of validation checks for input parameters, as well as inability to incorporate survey design.Entities:
Year: 2019 PMID: 31479445 PMCID: PMC6719830 DOI: 10.1371/journal.pone.0214262
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Input parameters for %svy_unilogit, %svy_multilogit and %svy_printlogit macros.
| Parameter | Description |
|---|---|
| dataset | name of input dataset |
| outcome | name of dependent binary variable of interest e.g., hiv_status |
| outevent | value label of outcome variable (without quotation) to model e.g., Positive, in the case of modeling Hepatitis A risk factors |
| catvars | list of categorical explanatory variables (nominal or ordinal) separated by space e.g., age category (in years) which takes the categories; 1 = “20–39”, 2 = “40–59”, 3 = “> = 60” |
| class | class statement for categorical explanatory variables specifying the baseline (reference) category e.g., age_category (ref = “> = 60”) |
| contvars | list of continuous explanatory variables separated by space e.g., Age (in years) which takes values from 20 to 70 years |
| condition | (optional) any conditional statements to create and or fine-tune the final analysis dataset specified using one IF statement |
| strata | (optional) survey stratification variable |
| cluster | (optional) survey clustering variable |
| weight | (optional) survey weighting variable |
| domain | (optional) domain variable for sub-population analysis |
| missval_lab | (optional) value label for missing values. If missing data have a format, it should be provided, otherwise macro assumes the default format “.” |
| varmethod | (optional) value for variance estimation method namely Taylor (the default) or replication-based variance estimation methods including JK or BRR |
| rep_weights_values | (optional) values for REPWEIGHTS statement, but may be specified with replication-based variance estimation method is JK or BRR |
| varmethod_opts | (optional) options for variance estimation method, e.g., jkcoef = 1 df = 25 for JK |
| missval_opts | (optional) options for handling missing data within proc survey statement, e.g., “MISSING” or “NOMCAR”. If no option is specified missing observations are excluded from the analysis |
| variable for displaying/suppressing the output table on the output window which takes the values (NO = suppress output, YES = show output) | |
| tablename | short name of output table |
| tabletitle | title of output table |
| outcome & outevent | same as defined in |
| outdir | directory for saving output files |
| syserr | SAS in-build macro variable that checks presence of any system errors |
Fig 1Sample %svy_logistic_regression macro call.
Output of simple logistic regression model results from %svy_unilogit macro.
| Factor | N | Freq | OR_CI | p_value | g_p_value |
|---|---|---|---|---|---|
| Gender | |||||
| Male | 473 | 205 (37.7) | ref | ||
| Female | 35 | 15 (39.7) | 1.09 (0.40–2.92) | 0.857 | 0.857 |
| Total | 508 | 220 (37.9) | |||
| Age category in years at screening | |||||
| > = 60 | 322 | 131 (30.7) | ref | ||
| 20–39 | 51 | 39 (83.0) | 11.0 (5.10–23.7) | < .001 | < .001 |
| 40–59 | 135 | 50 (31.2) | 1.02 (0.58–1.79) | 0.934 | |
| Total | 508 | 220 (37.9) | |||
| Race/Hispanic origin | |||||
| Non-Hispanic White | 307 | 114 (34.1) | ref | ||
| Mexican American | 23 | 14 (67.1) | 3.93 (1.14–13.5) | 0.018 | < .001 |
| Non-Hispanic Black | 126 | 59 (46.1) | 1.65 (1.12–2.43) | 0.006 | |
| Other Hispanic | 26 | 17 (64.3) | 3.47 (0.92–13.1) | 0.046 | |
| Other Race | 26 | 16 (52.3) | 2.12 (0.60–7.53) | 0.207 | |
| Total | 508 | 220 (37.9) | |||
| Served in a foreign country | |||||
| No | 243 | 86 (27.4) | ref | ||
| Yes | 264 | 134 (48.6) | 2.51 (1.32–4.77) | 0.002 | 0.002 |
| Total | 507 | 220 (38.1) | |||
| Education level | |||||
| College graduate or above | 147 | 51 (32.0) | ref | ||
| 9-11th grade | 37 | 16 (33.4) | 1.07 (0.50–2.28) | 0.857 | 0.296 |
| High school graduate | 122 | 56 (41.1) | 1.48 (0.83–2.65) | 0.147 | |
| Less than 9th grade | 9 | 5 (42.6) | 1.58 (0.42–6.02) | 0.465 | |
| Some college or AA degree | 193 | 92 (41.8) | 1.53 (0.90–2.61) | 0.089 | |
| Total | 508 | 220 (37.9) | |||
| Marital status | |||||
| Separated | 11 | 5 (27.3) | ref | ||
| Divorced | 75 | 30 (29.4) | 1.11 (0.36–3.42) | 0.847 | 0.029 |
| Living with partner | 17 | 9 (60.2) | 4.02 (0.81–19.8) | 0.063 | |
| Married | 311 | 130 (36.3) | 1.52 (0.53–4.37) | 0.403 | |
| Never married | 48 | 21 (51.4) | 2.81 (0.72–10.9) | 0.104 | |
| Widowed | 46 | 25 (44.0) | 2.09 (0.66–6.62) | 0.175 | |
| Total | 508 | 220 (37.9) | |||
| Age in years at screening | 508 | 220 (37.9) | 0.97 (0.95–0.98) | < .001 | < .001 |
@ = Total number of observations
& = Frequency of prevalent cases (and weighted prevalence in percentage)
$ = Weighted Odds Ratio (95% confidence interval)
α = Class level p-value
β = Global/Type 3 p-value
Quality publication-ready output from the %svy_printlogit macro combining results from %svy_unilogit and %svy_multilogit macros.
| Hepatitis A antibody | Unadjusted | Adjusted | ||||||
|---|---|---|---|---|---|---|---|---|
| Characteristic | Total | Positive | OR | p-value | Type3 | OR (95% CI) | p-value | Type3 |
| Gender | ||||||||
| Male | 473 | 205 (37.7) | ref | |||||
| Female | 35 | 15 (39.7) | 1.09 (0.40–2.92) | 0.857 | 0.857 | 1.00 (0.24–4.15) | 0.998 | 0.998 |
| Total | 508 | 220 (37.9) | ||||||
| Age category in years at screening | ||||||||
| > = 60 | 322 | 131 (30.7) | ref | |||||
| 20–39 | 51 | 39 (83.0) | 11.0 (5.10–23.7) | < .001 | < .001 | 13.8 (5.15–36.7) | < .001 | < .001 |
| 40–59 | 135 | 50 (31.2) | 1.02 (0.58–1.79) | 0.934 | 1.27 (0.54–2.99) | 0.544 | ||
| Total | 508 | 220 (37.9) | ||||||
| Race/Hispanic origin | ||||||||
| Non-Hispanic White | 307 | 114 (34.1) | ref | |||||
| Mexican American | 23 | 14 (67.1) | 3.93 (1.14–13.5) | 0.018 | < .001 | 3.41 (1.01–11.5) | 0.032 | 0.01 |
| Non-Hispanic Black | 126 | 59 (46.1) | 1.65 (1.12–2.43) | 0.006 | 1.91 (1.08–3.37) | 0.016 | ||
| Other Hispanic | 26 | 17 (64.3) | 3.47 (0.92–13.1) | 0.046 | 3.40 (0.53–21.6) | 0.159 | ||
| Other Race | 26 | 16 (52.3) | 2.12 (0.60–7.53) | 0.207 | 1.42 (0.28–7.28) | 0.647 | ||
| Total | 508 | 220 (37.9) | ||||||
| Served in a foreign country | ||||||||
| No | 243 | 86 (27.4) | ref | |||||
| Yes | 264 | 134 (48.6) | 2.51 (1.32–4.77) | 0.002 | 0.002 | 3.09 (1.85–5.17) | < .001 | < .001 |
| Total | 507 | 220 (38.1) | ||||||
| Education level | ||||||||
| College graduate or above | 147 | 51 (32.0) | ref | |||||
| 9-11th grade | 37 | 16 (33.4) | 1.07 (0.50–2.28) | 0.857 | 0.296 | |||
| High school graduate | 122 | 56 (41.1) | 1.48 (0.83–2.65) | 0.147 | ||||
| Less than 9th grade | 9 | 5 (42.6) | 1.58 (0.42–6.02) | 0.465 | ||||
| Some college or AA degree | 193 | 92 (41.8) | 1.53 (0.90–2.61) | 0.089 | ||||
| Total | 508 | 220 (37.9) | ||||||
| Marital status | ||||||||
| Separated | 11 | 5 (27.3) | ref | |||||
| Divorced | 75 | 30 (29.4) | 1.11 (0.36–3.42) | 0.847 | 0.029 | 1.09 (0.31–3.81) | 0.884 | 0.017 |
| Living with partner | 17 | 9 (60.2) | 4.02 (0.81–19.8) | 0.063 | 1.96 (0.42–9.05) | 0.349 | ||
| Married | 311 | 130 (36.3) | 1.52 (0.53–4.37) | 0.403 | 1.69 (0.55–5.14) | 0.318 | ||
| Never married | 48 | 21 (51.4) | 2.81 (0.72–10.9) | 0.104 | 1.67 (0.47–5.97) | 0.387 | ||
| Widowed | 46 | 25 (44.0) | 2.09 (0.66–6.62) | 0.175 | 3.45 (0.89–13.4) | 0.051 | ||
| Total | 508 | 220 (37.9) | ||||||
| Age in years at screening | 508 | 220 (37.9) | 0.97 (0.95–0.98) | < .001 | < .001 | |||
¥ = Total number of observations
€ = Outcome variable label
£ = Outcome value label of category of interest
& = Frequency of prevalent cases (and weighted prevalence in percentages)
ξ = Weighted Odds Ratio
$ = Weighted 95% confidence interval for Odds Ratio
α = Class level p-value
β = Global/Type3 p-value