| Literature DB >> 19956539 |
Yudi Pawitan1, Ku Chee Seng, Patrik K E Magnusson.
Abstract
A great majority of genetic markers discovered in recent genome-wide association studies have small effect sizes, and they explain only a small fraction of the genetic contribution to the diseases. How many more variants can we expect to discover and what study sizes are needed? We derive the connection between the cumulative risk of the SNP variants to the latent genetic risk model and heritability of the disease. We determine the sample size required for case-control studies in order to achieve a certain expected number of discoveries in a collection of most significant SNPs. Assuming similar allele frequencies and effect sizes of the currently validated SNPs, complex phenotypes such as type-2 diabetes would need approximately 800 variants to explain its 40% heritability. Much smaller numbers of variants are needed if we assume rare-variants but higher penetrance models. We estimate that up to 50,000 cases and an equal number of controls are needed to discover 800 common low-penetrant variants among the top 5000 SNPs. Under common and rare low-penetrance models, the very large studies required to discover the numerous variants are probably at the limit of practical feasibility. Under rare-variant with medium- to high-penetrance models (odds-ratios between 1.6 and 4.0), studies comparable in size to many existing studies are adequate provided the genotyping technology can interrogate more and rarer variants.Entities:
Mesh:
Year: 2009 PMID: 19956539 PMCID: PMC2780697 DOI: 10.1371/journal.pone.0007969
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Distribution of 383 ORs from 101 GWA studies listed in the Supplementary table (Table S1).
The top 9 SNPs from [1] (the first 9 on the first column) and 11 SNPs from [1].
| SNP | Freq. | OR | SNP | Freq. | OR |
| Rs10811661 | 0.83 | 1.20 | rs12779790 | 0.183 | 1.11 |
| Rs4402960 | 0.29 | 1.14 | rs7961581 | 0.269 | 1.09 |
| Rs1470579 | 0.30 | 1.17 | rs7578597 | 0.902 | 1.15 |
| Rs7754840 | 0.31 | 1.12 | rs4607103 | 0.761 | 1.09 |
| Rs1111875 | 0.53 | 1.13 | rs10923931 | 0.106 | 1.13 |
| rs13266634 | 0.65 | 1.12 | rs1153188 | 0.733 | 1.08 |
| Rs7903146 | 0.26 | 1.37 | rs17036101 | 0.927 | 1.15 |
| rs5219 | 0.47 | 1.14 | rs2641348 | 0.107 | 1.10 |
| Rs1801282 | 0.86 | 1.14 | rs9472138 | 0.282 | 1.06 |
| rs864745 | 0.501 | 1.10 | rs10490072 | 0.724 | 1.05 |
‘Freq.’ refers to the frequency and ‘OR’ the odds-ratio of the risk allele.
Figure 2Distribution of latent genetic risk derived for the type-2 diabetes example, computed using (1) and (2).
Figure 3The number of variants required to explain the corresponding heritability.
The labels A–F refer to the genetic models given in Table 2.
Various models of genetic architecture and the number of variants needed to explain a heritability of 0.4.
| Scenario by Freq- Effect-sizes | Range of MAFs | Range of ORs | Number of variants for |
| A. Common-low | 0.073–0.499 | 1.05–1.15 | 812 |
| B. Modest-low | 0.0365–0.2495 | 1.05–1.15 | 1368 |
| C. Rare-low | 0.0146–0.0998 | 1.05–1.15 | 3114 |
| D. Rare-medium | 0.0146–0.0998 | 1.28–2.01 | 144 |
| E. Rare-high | 0.0073–0.0499 | 1.63–4.05 | 80 |
| F. Very-rare-high | 0.00073–0.00499 | 1.63–4.05 | 608 |
Figure 4The expected number of discoveries of causal variants as a function of the number of cases in a case-control study, with equal number of controls.
The models refer to those in Table 2 in terms of the range of MAFs and ORs of the risk alleles of non-null variants. For models A and B, we plot the expected number of discoveries among the top 1000 (solid), 2000 (dashed) and 5000 SNPs (dotted); for models D and E, they among the top 100 (solid), 200 (dashed) and 500 SNPs (dotted).