| Literature DB >> 19189221 |
Julian Little1, Julian P T Higgins, John P A Ioannidis, David Moher, France Gagnon, Erik von Elm, Muin J Khoury, Barbara Cohen, George Davey-Smith, Jeremy Grimshaw, Paul Scheet, Marta Gwinn, Robin E Williamson, Guang Yong Zou, Kim Hutchings, Candice Y Johnson, Valerie Tait, Miriam Wiens, Jean Golding, Cornelia van Duijn, John McLaughlin, Andrew Paterson, George Wells, Isabel Fortier, Matthew Freedman, Maja Zecevic, Richard King, Claire Infante-Rivard, Alex Stewart, Nick Birkett.
Abstract
Making sense of rapidly evolving evidence on genetic associations is crucial to making genuine advances in human genomics and the eventual integration of this information in the practice of medicine and public health. Assessment of the strengths and weaknesses of this evidence, and hence the ability to synthesize it, has been limited by inadequate reporting of results. The STrengthening the REporting of Genetic Association studies (STREGA) initiative builds on the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement and provides additions to 12 of the 22 items on the STROBE checklist. The additions concern population stratification, genotyping errors, modeling haplotype variation, Hardy-Weinberg equilibrium, replication, selection of participants, rationale for choice of genes and variants, treatment effects in studying quantitative traits, statistical methods, relatedness, reporting of descriptive and outcome data, and the volume of data issues that are important to consider in genetic association studies. The STREGA recommendations do not prescribe or dictate how a genetic association study should be designed but seek to enhance the transparency of its reporting, regardless of choices made during design, conduct, or analysis.Entities:
Mesh:
Year: 2009 PMID: 19189221 PMCID: PMC2764094 DOI: 10.1007/s10654-008-9302-y
Source DB: PubMed Journal: Eur J Epidemiol ISSN: 0393-2990 Impact factor: 8.082
STREGA reporting recommendations, extended from STROBE Statement
| Item | Item number | STROBE guideline | Extension for genetic association studies (STREGA) |
|---|---|---|---|
| Title and abstract | 1 | (a) Indicate the study’s design with a commonly used term in the title or the abstract. | |
| (b) Provide in the abstract an informative and balanced summary of what was done and what was found. | |||
|
| |||
| Background rationale | 2 | Explain the scientific background and rationale for the investigation being reported. | |
| Objectives | 3 | State specific objectives, including any pre-specified hypotheses. | State if the study is the first report of a genetic association, a replication effort, or both. |
|
| |||
| Study design | 4 | Present key elements of study design early in the paper. | |
| Setting | 5 | Describe the setting, locations and relevant dates, including periods of recruitment, exposure, follow-up, and data collection. | |
| Participants | 6 | (a)
| Give information on the criteria and methods for selection of subsets of participants from a larger study, when relevant. |
(b)
| |||
| Variables | 7 | (a) Clearly define all outcomes, exposures, predictors, potential confounders, and effect modifiers. Give diagnostic criteria, if applicable. | (b) Clearly define genetic exposures (genetic variants) using a widely used nomenclature system. Identify variables likely to be associated with population stratification (confounding by ethnic origin). |
| Data sources/measurement | 8a | (a) For each variable of interest, give sources of data and details of methods of assessment (measurement). Describe comparability of assessment methods if there is more than one group. | (b) Describe laboratory methods, including source and storage of DNA, genotyping methods and platforms (including the allele calling algorithm used, and its version), error rates and call rates. State the laboratory/centre where genotyping was done. Describe comparability of laboratory methods if there is more than one group. Specify whether genotypes were assigned using all of the data from the study simultaneously or in smaller batches. |
| Bias | 9 | (a) Describe any efforts to address potential sources of bias. | (b) For quantitative outcome variables, specify if any investigation of potential bias resulting from pharmacotherapy was undertaken. If relevant, describe the nature and magnitude of the potential bias, and explain what approach was used to deal with this. |
| Study size | 10 | Explain how the study size was arrived at. | |
| Quantitative variables | 11 | Explain how quantitative variables were handled in the analyses. If applicable, describe which groupings were chosen, and why. | If applicable, describe how effects of treatment were dealt with. |
| Statistical methods | 12 | (a) Describe all statistical methods, including those used to control for confounding. | State software version used and options (or settings) chosen. |
| (b) Describe any methods used to examine subgroups and interactions. | |||
| (c) Explain how missing data were addressed. | |||
(d)
| |||
| (e) Describe any sensitivity analyses. | |||
| (f) State whether Hardy–Weinberg equilibrium was considered and, if so, how. | |||
| (g) Describe any methods used for inferring genotypes or haplotypes. | |||
| (h) Describe any methods used to assess or address population stratification. | |||
| (i) Describe any methods used to address multiple comparisons or to control risk of false positive findings. | |||
| (j) Describe any methods used to address and correct for relatedness among subjects. | |||
|
| |||
| Participants | 13a | (a) Report the numbers of individuals at each stage of the study—e.g., numbers potentially eligible, examined for eligibility, confirmed eligible, included in the study, completing follow-up, and analysed. | Report numbers of individuals in whom genotyping was attempted and numbers of individuals in whom genotyping was successful. |
| (b) Give reasons for non-participation at each stage. | |||
| (c) Consider use of a flow diagram. | |||
| Descriptive data | 14a | (a) Give characteristics of study participants (e.g., demographic, clinical, social) and information on exposures and potential confounders. | Consider giving information by genotype. |
| (b) Indicate the number of participants with missing data for each variable of interest. | |||
| (c) | |||
| Outcome data | 15a |
| Report outcomes (phenotypes) for each genotype category over time. |
|
| Report numbers in each genotype category. | ||
|
| Report outcomes (phenotypes) for each genotype category. | ||
| Main results | 16 | (a) Give unadjusted estimates and, if applicable, confounder-adjusted estimates and their precision (e.g., 95% confidence intervals). Make clear which confounders were adjusted for and why they were included. | |
| (b) Report category boundaries when continuous variables were categorized. | |||
| (c) If relevant, consider translating estimates of relative risk into absolute risk for a meaningful time period. | |||
| (d) Report results of any adjustments for multiple comparisons. | |||
| Other analyses | 17 | (a) Report other analyses done—e.g., analyses of subgroups and interactions, and sensitivity analyses. | |
| (b) If numerous genetic exposures (genetic variants) were examined, summarize results from all analyses undertaken. | |||
| (c) If detailed results are available elsewhere, state how they can be accessed. | |||
|
| |||
| Key results | 18 | Summarize key results with reference to study objectives. | |
| Limitations | 19 | Discuss limitations of the study, taking into account sources of potential bias or imprecision. Discuss both direction and magnitude of any potential bias. | |
| Interpretation | 20 | Give a cautious overall interpretation of results considering objectives, limitations, multiplicity of analyses, results from similar studies, and other relevant evidence. | |
| Generalizability | 21 | Discuss the generalizability (external validity) of the study results. | |
|
| |||
| Funding | 22 | Give the source of funding and the role of the funders for the present study and, if applicable, for the original study on which the present article is based. | |
STREGA STrengthening the REporting of Genetic Association Studies, STROBE Strengthening the Reporting of Observational Studies in Epidemiology
aGive information separately for cases and controls in case-control studies and, if applicable, for exposed and unexposed groups in cohort and cross-sectional studies
Rationale for inclusion of topics in the STREGA recommendations
| Specific issue in genetic association studies | Rationale for inclusion in STREGA | Item(s) in STREGA | Specific suggestions for reporting |
|---|---|---|---|
|
| |||
| Genotyping errors (misclassification of exposure) | Non-differential genotyping errors will usually bias associations towards the null [ | 8(b): Describe laboratory methods, including source and storage of DNA, genotyping methods and platforms (including the allele calling algorithm used, and its version), error rates and call rates. State the laboratory/centre where genotyping was done. Describe comparability of laboratory methods if there is more than one group. Specify whether genotypes were assigned using all of the data from the study simultaneously or in smaller batches. 13(a): Report numbers of individuals in whom genotyping was attempted and numbers of individuals in whom genotyping was successful. | Factors affecting the potential extent of misclassification (information bias) of genotype include the types and quality of samples, timing of collection, and the method used for genotyping [ When high throughput platforms are used, it is important to report not only the platform used but also the allele calling algorithm and its version. Different calling algorithms have different strengths and weaknesses ([ For some high throughput platforms, the user may choose to assign genotypes using all of the data from the study simultaneously, or in smaller batches, such as by plate [ For case-control studies, whether genotyping was done blind to case-control status should be reported, along with the reason for this decision. |
| Population stratification (confounding by ethnic origin) | When study sub-populations differ both in allele (or genotype) frequencies and disease risks, then confounding will occur if these sub-populations are unevenly distributed across exposure groups (or between cases and controls). | 12(h): Describe any methods used to assess or address population stratification. | In view of the debate about the potential implications of population stratification for the validity of genetic association studies, transparent reporting of the methods used, or stating that none was used, to address this potential problem is important for allowing the empirical evidence to accrue. Ethnicity information should be presented (see for example Winker [ As several methods of adjusting for population stratification have been proposed [ |
| Modeling haplotype variation | In designs considered in this article, haplotypes have to be inferred because of lack of available family information. There are diverse methods for inferring haplotypes. | 12(g): Describe any methods used for inferring genotypes or haplotypes. | When discrete “windows” are used to summarize haplotypes, variation in the definition of these may complicate comparisons across studies, as results may be sensitive to choice of windows. Related “imputation” strategies are also in use [ It is important to give details on haplotype inference and, when possible, uncertainty. Additional considerations for reporting include the strategy for dealing with rare haplotypes, window size and construction (if used) and choice of software. |
| Hardy–Weinberg equilibrium (HWE) | Departure from Hardy–Weinberg equilibrium may indicate errors or peculiarities in the data [ | 12(f): State whether Hardy–Weinberg equilibrium was considered and, if so, how. | Any statistical tests or measures should be described, as should any procedure to allow for deviations from Hardy–Weinberg equilibrium in evaluating genetic associations [ |
| Replication | Publications that present and synthesize data from several studies in a single report are becoming more common. | 3: State if the study is the first report of a genetic association, a replication effort, or both. | The selected criteria for claiming successful replication should also be explicitly documented. |
|
| |||
| Selection of participants | Selection bias may occur if (i) genetic associations are investigated in one or more subsets of participants (sub-samples) from a particular study; or (ii) there is differential non-participation in groups being compared; or, (iii) there are differential genotyping call rates in groups being compared. | 6(a): Give information on the criteria and methods for selection of subsets of participants from a larger study, when relevant. 13(a): Report numbers of individuals in whom genotyping was attempted and numbers of individuals in whom genotyping was successful. | Inclusion and exclusion criteria, sources and methods of selection of sub-samples should be specified, stating whether these were based on a priori or post hoc considerations. |
| Rationale for choice of genes and variants investigated | Without an explicit rationale, it is difficult to judge the potential for selective reporting of study results. There is strong empirical evidence from randomized controlled trials that reporting of trial outcomes is frequently incomplete and biased in favor of statistically significant findings [ | 7(b): Clearly define genetic exposures (genetic variants) using a widely used nomenclature system. Identify variables likely to be associated with population stratification (confounding by ethnic origin). | The scientific background and rationale for investigating the genes and variants should be reported. For genome-wide association studies, it is important to specify what initial testing platforms were used and how gene variants are selected for further testing in subsequent stages. This may involve statistical considerations (for example, selection of Guidelines for human gene nomenclature have been published by the Human Gene Nomenclature Committee [ |
| Treatment effects in studies of quantitative traits | A study of a quantitative variable may be compromised when the trait is subjected to the effects of a treatment for example, the study of a lipid-related trait for which several individuals are taking lipid-lowering medication. Without appropriate correction, this can lead to bias in estimating the effect and loss of power. | 9(b): For quantitative outcome variables, specify if any investigation of potential bias resulting from pharmacotherapy was undertaken. If relevant, describe the nature and magnitude of the potential bias, and explain what approach was used to deal with this. 11: If applicable, describe how effects of treatment were dealt with. | Several methods of adjusting for treatment effects have been proposed [ |
| Statistical methods | Analysis methods should be transparent and replicable, and genetic association studies are often performed using specialized software. | 12(a): State software version used and options (or settings) chosen. | |
| Relatedness | The methods of analysis used in family based studies are different from those used in studies that are based on unrelated cases and controls. Moreover, even in the studies that are based on apparently unrelated cases and controls, some individuals may have some connection and may be (distant) relatives, and this is particularly common in small, isolated populations, for example, Iceland. This may need to be probed with appropriate methods and adjusted for in the analysis of the data. | 12(j) Describe any methods used to address and correct for relatedness among subjects. | For the great majority of studies in which samples are drawn from large, non-isolated populations, relatedness is typically negligible and results would not be altered depending on whether relatedness is taken into account. This may not be the case in isolated populations or those with considerable inbreeding. If investigators have assessed for relatedness, they should state the method used [ |
| Reporting of descriptive and outcome data | The synthesis of findings across studies depends on the availability of sufficiently detailed data. | 14(a): Consider giving information by genotype. 15:
| |
| Volume of data | The key problem is of possible false-positive results and selective reporting of these. Type I errors are particularly relevant to the conduct of genome-wide association studies. A large search among hundreds of thousands of genetic variants can be expected by chance alone to find thousands of false positive results (odds ratios significantly different from 1.0). | 12(i): Describe any methods used to address multiple comparisons or to control risk of false positive findings. 16(d): Report results of any adjustments for multiple comparisons. 17(b): If numerous genetic exposures (genetic variants) were examined, summarize results from all analyses undertaken. 17(c): If detailed results are available elsewhere, state how they can be accessed. | Genome-wide association studies collect information on a very large number of genetic variants concomitantly. Initiatives to make the entire database transparent and available online may supply a definitive solution to the problem of selective reporting [ Availability of raw data may help interested investigators reproduce the published analyses and also pursue additional analyses. A potential drawback of public data availability is that investigators using the data second-hand may not be aware of limitations or other problems that were originally encountered, unless these are also transparently reported. In this regard, collaboration of the data users with the original investigators may be beneficial. Issues of consent and confidentiality [ The volume of data analyzed should also be considered in the interpretation of findings. Examples of methods of summarizing results include giving distribution of |