| Literature DB >> 34886424 |
Salvatore Fasola1, Laura Montalbano1, Giovanna Cilluffo1, Benjamin Cuer2, Velia Malizia1, Giuliana Ferrante3, Isabella Annesi-Maesano2, Stefania La Grutta1.
Abstract
When investigating disease etiology, twin data provide a unique opportunity to control for confounding and disentangling the role of the human genome and exposome. However, using appropriate statistical methods is fundamental for exploiting such potential. We aimed to critically review the statistical approaches used in twin studies relating exposure to early life health conditions. We searched PubMed, Scopus, Web of Science, and Embase (2011-2021). We identified 32 studies and nine classes of methods. Five were conditional approaches (within-pair analyses): additive-common-erratic (ACE) models (11 studies), generalized linear mixed models (GLMMs, five studies), generalized linear models (GLMs) with fixed pair effects (four studies), within-pair difference analyses (three studies), and paired-sample tests (two studies). Four were marginal approaches (unpaired analyses): generalized estimating equations (GEE) models (five studies), GLMs with cluster-robust standard errors (six studies), GLMs (one study), and independent-sample tests (one study). ACE models are suitable for assessing heritability but require adaptations for binary outcomes and repeated measurements. Conditional models can adjust by design for shared confounders, and GLMMs are suitable for repeated measurements. Marginal models may lead to invalid inference. By highlighting the strengths and limitations of commonly applied statistical methods, this review may be helpful for researchers using twin designs.Entities:
Keywords: children; exposome; genome; health; statistical methods; twin data
Mesh:
Year: 2021 PMID: 34886424 PMCID: PMC8657152 DOI: 10.3390/ijerph182312696
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1Flow diagram showing the study selection process. * Reasons for exclusion: reviews (n = 2), abstracts (n = 11), statistical analyses included singletons and twins together (n = 1), different outcomes (n = 3), twin mothers (n = 1), adult twins (n = 1), and no statistical analyses (n = 1).
Classes of statistical methods used in the reviewed studies.
| Class of Methods | Strengths | Limitations | R Libraries |
|---|---|---|---|
| ACE models |
Confounders can be included Optimal inference Shared confounders are adjusted for by design Genetic contribution can be estimated |
Require adaptations for binary outcomes and repeated measurements | umx |
| Generalized linear mixed models (GLMMs) |
Suitable for binary outcomes Confounders can be included Optimal inference Shared confounders are adjusted for by design Suitable for repeated measurements |
Genetic contribution cannot be estimated | lme4 |
| GLMs with fixed pair effects |
Suitable for binary outcomes Individual-level confounders can be included Shared confounders are adjusted for by design |
Shared confounders cannot be included Estimators may be sub-optimal Unsuitable for repeated measurements Genetic contribution cannot be estimated | stats |
| Within-pair difference analyses |
Individual-level confounders can be included Optimal inference Shared confounders are adjusted for by design |
Unsuitable for binary outcomes Shared confounders cannot be included Unsuitable for repeated measurements Genetic contribution cannot be estimated | stats |
| Paired-sample tests |
Optimal inference Shared confounders are adjusted for by design |
Require a binary exposure Require adaptations for binary outcomes Confounders cannot be included Unsuitable for repeated measurements Genetic contribution cannot be estimated | stats |
| Generalized estimating equations (GEE) models |
Suitable for binary outcomes Confounders can be included Optimal inference |
Shared confounders are not adjusted for by design Require adaptations for repeated measurements Genetic contribution cannot be estimated | gee |
| Generalized linear models (GLMs) with cluster-robust standard errors |
Suitable for binary outcomes Confounders can be included Optimal standard error estimators |
Sub-optimal effect-size estimators Shared confounders are not adjusted for by design Unsuitable for repeated measurements Genetic contribution cannot be estimated | sandwich |
| Generalized linear models (GLMs) |
Suitable for binary outcomes Confounders can be included |
Sub-optimal inference Shared confounders are not adjusted for by design Unsuitable for repeated measurements Genetic contribution cannot be estimated | stats |
| Independent-sample tests |
Suitable for binary outcomes |
Require a binary exposure Confounders cannot be included Sub-optimal inference Shared confounders are not adjusted for by design Unsuitable for repeated measurements Genetic contribution cannot be estimated | stats |
Figure 2Summary of recommendations regarding which types of models researchers should favor under different conditions (research needs, data structure, and desired parameter meaning). 1 Conditional logistic regression is recommended. 2 Accommodated for through the use of dummy variables. 3 Either the outcome or the exposure mustable t be binary. 4 Only in models with identity link function and in log-linear models.