| Literature DB >> 35849588 |
Jessica A Grembi1, Elizabeth T Rogawski McQuade2.
Abstract
Common statistical modeling methods do not necessarily produce the most relevant or interpretable effect estimates to communicate risk. Overreliance on the odds ratio and relative effect measures limit the potential impact of epidemiologic and public health research. We created a straightforward R package, called riskCommunicator, to facilitate the presentation of a variety of effect measures, including risk differences and ratios, number needed to treat, incidence rate differences and ratios, and mean differences. The riskCommunicator package uses g-computation with parametric regression models and bootstrapping for confidence intervals to estimate effect measures in time-fixed data. We demonstrate the utility of the package using data from the Framingham Heart Study to estimate the effect of prevalent diabetes on the 24-year risk of cardiovascular disease or death. The package promotes the communication of public-health relevant effects and is accessible to a broad range of epidemiologists and health researchers with little to no expertise in causal inference methods or advanced coding.Entities:
Mesh:
Year: 2022 PMID: 35849588 PMCID: PMC9292119 DOI: 10.1371/journal.pone.0265368
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Arguments supplied to the gComp function in the riskCommunicator package.
| Argument | Description |
|---|---|
| data | (Required) A data.frame or tibble containing variables for Y, X, and Z or with variables matching the model variables specified in a user-supplied formula. Data set should also contain variables for the optional subgroup and offset, if they are specified. |
| outcome.type | (Required) Character argument to describe the outcome type. Acceptable responses, and the corresponding error distribution and link function used in the glm, include: |
| | |
| (Default) A binomial distribution with link = ‘logit’ is used. Function returns the risk difference, risk ratio, odds ratio, and number needed to treat/harm. | |
| | |
| A Poisson distribution with link = ‘log’ is used. Function returns the incidence rate difference and incidence rate ratio. | |
| | |
| A negative binomial distribution with link = ‘log’ is used, where the theta parameter is estimated internally; ideal for over-dispersed count data. Function returns the incidence rate difference and incidence rate ratio. | |
| | |
| A Poisson distribution with link = ‘log’ is used; ideal for events/person-time outcomes. Function returns the incidence rate difference and incidence rate ratio. | |
| | |
| A negative binomial distribution with link = ‘log’ is used, where the theta parameter is estimated internally; ideal for over-dispersed events/person-time outcomes. Function returns the incidence rate difference and incidence rate ratio. | |
| | |
| A gaussian distribution with link = ‘identity’ is used. Function returns the mean difference. | |
| formula | (Optional) Default NULL (i.e. argument is optional). An object of class “formula” (or one that can be coerced to that class) which provides the complete model formula, similar to the formula for the glm function in R (e.g. ‘Y ~ X + Z1 + Z2 + Z3’). Can be supplied as a character or formula object. If no formula is provided, Y and X must be provided. |
| Y | (Optional) Default NULL (i.e. argument is optional). Character argument which specifies the outcome variable. Can optionally provide a formula instead of Y and X variables. |
| X | (Optional) Default NULL (i.e. argument is optional). Character argument which specifies the exposure variable (or treatment group assignment), which can be binary, categorical, or continuous. This variable can be supplied as a factor variable (for binary or categorical exposures) or a continuous variable. For binary/categorical exposures, X should be supplied as a factor with the lowest level set to the desired referent. Numeric variables are accepted, but will be centered. Character variables are not accepted and will throw an error. Can optionally provide a formula instead of Y and X variables. |
| Z | (Optional) Default NULL (i.e. argument is optional). List or single character vector which specifies the names of covariates or other variables to adjust for in the glm function. All variables should either be factors, continuous, or coded 0/1 (i.e. not character variables). Does not allow interaction terms. |
| subgroup | (Optional) Default NULL (i.e. argument is optional). Character argument that indicates subgroups for stratified analysis. Effects will be reported for each category of the subgroup variable. Variable will be automatically converted to a factor if not already. |
| offset | (Optional, only applicable for rate/count outcomes) Default NULL (i.e. argument is optional). Character argument which specifies the person-time denominator for rate outcomes to be included as an offset in the Poisson regression model. Numeric variable should be on the linear scale; function will take natural log before including in the model. |
| rate.multiplier | (Optional, only applicable for rate/count outcomes) Default 1. Numeric variable signifying the person-time value to use in predictions; the offset variable will be set to this when predicting under the counterfactual conditions. This value should be set to the person-time denominator desired for the rate difference measure and must be inputted in the units of the original offset variable (e.g. if the offset variable is in days and the desired rate difference is the rate per 100 person-years, rate.multiplier should be inputted as 365.25*100). |
| exposure.scalar | (Optional, only applicable for continuous exposure) Default 1. Numeric value to scale effects with a continuous exposure. This option facilitates reporting effects for an interpretable contrast (i.e. magnitude of difference) within the continuous exposure. For example, if the continuous exposure is age in years, a multiplier of 10 would result in estimates per 10-year increase in age rather than per a 1-year increase in age. |
| exposure.center | (Optional, only applicable for continuous exposure) Default TRUE. Logical or numeric value to center a continuous exposure. This option facilitates reporting effects at the mean value of the exposure variable, and allows for a mean value to be provided directly to the function in cases where bootstrap resampling is being conducted and a standardized centering value should be used across all bootstraps. See note below on continuous exposure variables for additional details. |
| R | (Optional) Default 200. The number of data resamples to be conducted to produce the bootstrap confidence interval of the estimate. |
| clusterID | (Optional) Default NULL (i.e. argument is optional). Character argument which specifies the variable name for the unique identifier for clusters. This option specifies that clustering should be accounted for in the calculation of confidence intervals. The clusterID will be used as the level for resampling in the bootstrap procedure. |
| parallel | (Optional) Default “no.” The type of parallel operation to be used. Available options (besides the default of no parallel processing) include “multicore” (not available for Windows) or “snow.” This argument is passed directly to boot. See note about setting seeds and parallel computing. |
| ncpus | (Optional, only used if parallel is set to “multicore” or “snow”) Default 1. Integer argument for the number of CPUs available for parallel processing/ number of parallel operations to be used. This argument is passed directly to boot. |
Effect of prevalent diabetes at the beginning of the study on the 24-year risk of cardiovascular disease or death among 4,240 participants in the Framingham Heart Study.
| riskCommunicator | Standard regression models | |
|---|---|---|
| Effect measure | Marginal effect estimate | Covariate-conditional effect estimate |
| Risk difference | 0.29 (0.20, 0.39) | N/A |
| Risk ratio | 1.70 (1.48, 1.97) | 1.49 (1.33, 1.66) |
| Odds ratio | 4.55 (2.77, 9.09) | 4.55 (2.66, 7.78) |
| Number needed to treat | 3.48 | N/A |
*Log-linear regression for the risk difference, Poisson approximation of log-binomial regression with robust variance for the risk ratio, logistic regression for the odds ratio with Wald-based confidence intervals.
†Adjusted for patient’s age, sex, body mass index (BMI), smoking status (current smoker or not), and prevalence of hypertension.
‡Log-linear model did not converge.
Fig 1Histograms and quantile-quantile (Q-Q) plots of bootstrap iterations (R = 1000) obtained from the binary.res output for each effect measure.
NOTE: All ratio values are plotted as natural log of the actual estimate.
Fig 2Effect of having prevalent diabetes at the beginning of the study on the 24-year risk of cardiovascular disease or death overall and stratified by sex among 4,240 participants in the Framingham Heart Study.
A) Incidence rate ratio. B) Incidence rate difference. riskCommunicator was used to obtain marginal effect estimates (purple) and Poisson regression was used to obtain covariate-conditional estimates (green; not available for incidence rate difference). All models were adjusted for patient’s age, sex, body mass index, smoking status (current smoker or not), and prevalence of hypertension. Each point represents the point estimate and error bars show the 95% CI.