| Literature DB >> 19498075 |
Muin J Khoury1, Lars Bertram, Paolo Boffetta, Adam S Butterworth, Stephen J Chanock, Siobhan M Dolan, Isabel Fortier, Montserrat Garcia-Closas, Marta Gwinn, Julian P T Higgins, A Cecile J W Janssens, James Ostell, Ryan P Owen, Roberta A Pagon, Timothy R Rebbeck, Nathaniel Rothman, Jonine L Bernstein, Paul R Burton, Harry Campbell, Anand Chockalingam, Helena Furberg, Julian Little, Thomas R O'Brien, Daniela Seminara, Paolo Vineis, Deborah M Winn, Wei Yu, John P A Ioannidis.
Abstract
Genome-wide association studies (GWAS) have led to a rapid increase in available data on common genetic variants and phenotypes and numerous discoveries of new loci associated with susceptibility to common complex diseases. Integrating the evidence from GWAS and candidate gene studies depends on concerted efforts in data production, online publication, database development, and continuously updated data synthesis. Here the authors summarize current experience and challenges on these fronts, which were discussed at a 2008 multidisciplinary workshop sponsored by the Human Genome Epidemiology Network. Comprehensive field synopses that integrate many reported gene-disease associations have been systematically developed for several fields, including Alzheimer's disease, schizophrenia, bladder cancer, coronary heart disease, preterm birth, and DNA repair genes in various cancers. The authors summarize insights from these field synopses and discuss remaining unresolved issues -- especially in the light of evidence from GWAS, for which they summarize empirical P-value and effect-size data on 223 discovered associations for binary outcomes (142 with P < 10(-7)). They also present a vision of collaboration that builds reliable cumulative evidence for genetic associations with common complex diseases and a transparent, distributed, authoritative knowledge base on genetic variation and human health. As a next step in the evolution of Human Genome Epidemiology reviews, the authors invite investigators to submit field synopses for possible publication in the American Journal of Epidemiology.Entities:
Mesh:
Year: 2009 PMID: 19498075 PMCID: PMC2714948 DOI: 10.1093/aje/kwp119
Source DB: PubMed Journal: Am J Epidemiol ISSN: 0002-9262 Impact factor: 4.897
Trends in Numbers of Published Articles on Human Genome Epidemiology, Meta-Analyses, and Genome-Wide Association Studies and Numbers of Genes Studied, by Year, 2001–2008a
| Year | No. of Genes | No. of Diseases | No. of Articles Published | ||
| Total | GWAS | Meta-Analyses | |||
| 2001 | 633 | 690 | 2,492 | 0 | 34 |
| 2002 | 794 | 855 | 3,196 | 0 | 45 |
| 2003 | 832 | 880 | 3,476 | 3 | 65 |
| 2004 | 1,124 | 1,021 | 4,280 | 0 | 86 |
| 2005 | 1,308 | 1,077 | 5,029 | 5 | 113 |
| 2006 | 1,502 | 1,109 | 5,364 | 12 | 155 |
| 2007 | 2,142 | 1,292 | 7,222 | 104 | 208 |
| 2008 | 3,336 | 1,203 | 7,659 | 134 | 206 |
Abbreviations: HuGE, Human Genome Epidemiology; GWAS, genome-wide association studies.
Data were obtained through a HuGE Navigator query (http://www.hugenavigator.net/) conducted on February 14, 2009.
Does not include the numbers of studied variants per gene, as such information is difficult to obtain. Individual genes were not counted unless they were featured in the paper.
Includes HuGE reviews.
Considerations for Epidemiologic Credibility in the Assessment of Cumulative Evidence on Genetic Associationsa
| Criteria and Categories | Proposed Operationalization |
| Amount of evidence | |
| A: Large-scale evidence | Thresholds may be defined on the basis of sample size, power, or false-discovery rate considerations. The frequency of the genetic variant of interest should be accounted for. As a simple rule, we suggest that category A require a sample size of more than 1,000 (total number in cases and controls, assuming a 1:1 ratio) evaluated in the least common genetic group of interest; that B correspond to a sample size of 100–1,000 evaluated in this group; and that C correspond to a sample size of less than 100 evaluated in this group (see “Discussion” section in the text and |
| B: Moderate amount of evidence | |
| C: Little evidence | |
| Replication | |
| A: Extensive replication including at least 1 well-conducted meta-analysis with little between-study inconsistency | Between-study inconsistency entails statistical considerations (e.g., defined by metrics such as |
| B: Well-conducted meta-analysis with some methodological limitations or moderate between-study inconsistency | |
| C: No association; no independent replication; failed replication; scattered studies; flawed meta-analysis or large inconsistency | |
| Protection from bias | |
| A: Bias, if at all present, could affect the magnitude but probably not the presence of the association | A prerequisite for A is that the bias due to phenotype measurement, genotype measurement, confounding (population stratification), and selective reporting (for meta-analyses) can be appraised as not being high (as shown in detail in |
| B: No obvious bias that may affect the presence of the association, but there is considerable missing information on the generation of evidence | |
| C: Considerable potential for or demonstrable bias that can affect even the presence or absence of the association |
Based on the Venice criteria (18).
For example, if the association pertains to the presence of homozygosity for a common variant and if the frequency of homozygosity is 3%, then category A under “Amount of evidence” requires over 30,000 subjects and category B between 3,000 and 30,000. The sample size refers to subjects when genotype contrasts are used and to alleles when alleles are contrasted.
Key Characteristics of Pilot Field Synopses of Genetic Associations
| No. of Meta-Analyses | No. of Data Sets | Threshold | No. of Statistically Significant Associations | Strong | World Wide Web Address | |
| Alzheimer's disease | 228 | 1,072 | 4 data sets | 53 | NA | |
| Schizophrenia | 118 | 1,179 | 4 data sets | 24 | 4 | |
| DNA repair genes and various cancers | 241 | 1,087 | 2 independent teams | 31 | 3 | |
| Bladder cancer | 36 | 356 | 3 data sets | 7 | 1 | Not yet online |
| Coronary heart disease | 48 | 1,039 | 4 | 0 | ||
| Preterm birth | 17 | 87 | 3 data sets | 2 | 0 | |
| Major depression | 22 | 131 | 3 data sets | 6 | 2 | Not yet online |
Abbreviation: NA, not applicable.
Total number of data sets included in the meta-analyses (not including data sets that did not undergo meta-analysis).
Authors’ prerequisite condition for conducting a meta-analysis.
Statistically significant (P < 0.05) by random-effects calculations on the default (per allele) analysis (for coronary heart disease, results are based on a meta-regression model and correspond to effects in the largest studies, while for DNA repair genes, both recessive and dominant models were investigated).
Grade AAA with regard to all 3 Venice criteria (18).
Current on February 27, 2008.
Current on April 30, 2008.
Some Checks for Retrospective Meta-Analyses in Field Synopses of Genetic Associations
| General checks for the occurrence of or susceptibility to potential problems |
| Small effect size (e.g., odds ratio less than 1.15-fold from the null value) |
| Association lost with exclusion of first study |
| Association lost with exclusion of HWE-violating studies or with adjustment for HWE |
| Evidence for small-study effect in an asymmetry regression test with proper type I error ( |
| Evidence for excess of single studies with formally statistically significant results ( |
| Topic- or subject-specific checks: consider whether they are problems |
| Unclear/misclassified phenotypes with possible differential misclassification against genotyping |
| Differential misclassification of genotyping against phenotypes |
| Major concerns for population stratification (need to justify for affecting odds ratio greater than 1.15-fold; not invoked to date) |
| Any other reason (case-by-case basis) that would render the evidence for association highly questionable |
Abbreviation: HWE, Hardy-Weinberg equilibrium.
All general checks are likely to have only modest, imperfect sensitivity and specificity for detecting problems. In particular, for effect size, a small effect size may very well reflect a true association, since many genetic associations have small effect sizes. However, if this effect has been documented in a retrospective meta-analysis that is susceptible to publication and other reporting biases, it also needs to be replicated in a prospective setting where such biases cannot operate before high credibility can be attributed to it.
Figure 1.A) Levels of statistical significance for associations of genetic loci with P values of 10−5 or lower identified through genome-wide association studies (GWAS) and entered in the National Human Genome Research Institute catalog of GWAS as of October 14, 2008 (38, 39); data are limited to those loci that have binary outcome phenotypes (n = 223). For details on selection of loci in the catalog, see Hindorff et al. (38) and Manolio et al. (39). B) Odds ratios (per allele) for the 223 associations. C) Odds ratios for the 142 of the 233 associations that had P values less than 10−7. Not shown are the 5, 13, and 7 outliers that had values outside of the depicted range in the 3 panels, respectively.
Figure 2.A vision for collaboration among disease- and gene-specific investigators, systematic reviewers, and online publishers. HuGENet, Human Genome Epidemiology Network; HVP, Human Variome Project; P3G, Public Population Project in Genomics.