| Literature DB >> 22267498 |
Sharon R Browning1, Elizabeth A Thompson.
Abstract
Identity-by-descent (IBD) mapping tests whether cases share more segments of IBD around a putative causal variant than do controls. These segments of IBD can be accurately detected from genome-wide SNP data. We investigate the power of IBD mapping relative to that of SNP association testing for genome-wide case-control SNP data. Our focus is particularly on rare variants, as these tend to be more recent and hence more likely to have recent shared ancestry. We simulate data from both large and small populations and find that the relative performance of IBD mapping and SNP association testing depends on population demographic history and the strength of selection against causal variants. We also present an IBD mapping analysis of a type 1 diabetes data set. In those data we find that we can detect association only with the HLA region using IBD mapping. Overall, our results suggest that IBD mapping may have higher power than association analysis of SNP data when multiple rare causal variants are clustered within a gene. However, for outbred populations, very large sample sizes may be required for genome-wide significance unless the causal variants have strong effects.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22267498 PMCID: PMC3316661 DOI: 10.1534/genetics.111.136937
Source DB: PubMed Journal: Genetics ISSN: 0016-6731 Impact factor: 4.562
Figure 1 Simulation scheme. Each simulated region is made up of 100 simulated segments of length 1 kb with gaps of length 1 kb between them. The central five segments can contain causal SNPs. Causal SNPs are those that the simulation program designates as protein-changing mutations. These SNPs have been subject to negative selection at a specified rate. Only the causal SNPs and one SNP per segment with highest minor allele frequency (MAF) are retained. The causal SNPs are used to determine disease status, while the high MAF SNPs are tested in the association analysis. IBD status is determined through further simulation, as described in the main text.
Properties of simulated causal variants
| No. of variants | Variant frequencies | Haplotype carrier frequencies | Max | |
|---|---|---|---|---|
| 0.0005 | 11–16 | 0.00015–0.0060 | 0.045–0.13 | 0.91–1.00 |
| 0.001 | 9–14 | 0.00010–0.0031 | 0.019–0.050 | 0.28–1.00 |
| 0.002 | 8–13 | 0.00010–0.0020 | 0.0097–0.031 | 0.06–0.52 |
| 0.005 | 7–10 | 0.000088–0.0011 | 0.0045–0.011 | 0.03–0.16 |
Interquartile ranges (IQR; lower quartile to upper quartile) from the 100 simulations with selection coefficient s are shown for several quantities of interest. The second column gives the number of causal variants per simulation. The third column gives the frequencies of the causal variants. The fourth column gives the proportion of haplotypes that carry a causal variant. The final column gives the maximum squared correlation coefficient between any one of the 100 common variants tested in the association test with any one of the causal variants. All results are from the base simulation population of 10,000 individuals.
Simulated power results: Large population size
| No. of cases | No. of controls | Power assoc. | Power IBD25 | Power IBD100 | Assoc. | Assoc. | |
|---|---|---|---|---|---|---|---|
| 0.0005 | 500 | 500 | 0.87 | 0.57 | 0.81 | assoc. | NS |
| 0.001 | 500 | 500 | 0.65 | 0.53 | 0.81 | NS | IBD |
| 0.002 | 1000 | 1000 | 0.53 | 0.87 | 0.93 | IBD | IBD |
| 0.005 | 3000 | 3000 | 0.47 | 0.90 | 0.84 | IBD | IBD |
From an equilibrium population size of N = 10,000, the population was expanded to a recent effective size of Nrecent = 100,000. The selection coefficient, s, used in simulating the equilibrium population is given in the first column. The second and third columns give the sample sizes. The fourth column gives the estimated power of the SNP association test with G = 25 generations at the recent effective population size; the power of the SNP association test with G = 100 generations was not significantly different (data not shown). The fifth and sixth columns give the estimated power of pairwise IBD tests with IBD determined from G = 25 and G = 100 generations at the recent effective population size, respectively. All power estimates are from 100 replicates; the standard errors are 0.03–0.05. The seventh column states whether the SNP association test or IBD test with G = 25 is more powerful if the difference is significant (two-sided paired t-test P < 0.05) or NS (nonsignificant) otherwise. Similarly the eighth column compares the SNP association test and IBD test with G = 100.
Simulated power results: Small population size, very recent bottleneck
| No. of cases | No. of controls | Power assoc. | Power IBD25 | Assoc. | |
|---|---|---|---|---|---|
| 0.0005 | 200 | 200 | 0.53 | 0.64 | IBD |
| 0.001 | 400 | 400 | 0.60 | 0.73 | IBD |
| 0.002 | 400 | 600 | 0.51 | 0.60 | NS |
| 0.005 | 400 | 1000 | 0.33 | 0.46 | IBD |
From an equilibrium population size of N = 10,000, the population was contracted 25 generations ago to a recent effective size of Nrecent = 1000. The selection coefficient, s, used in simulating the equilibrium population is given in the first column. The second and third columns give the sample sizes. The fourth column gives the estimated power of the SNP association test, while the fifth column gives the estimated power of pairwise IBD test with IBD determined from the final G = 25 generations. All power estimates are from 100 replicates; the standard errors are 0.04–0.05. The sixth column states whether the SNP association test or IBD test is more powerful if the difference is significant (two-sided paired t-test P < 0.05) or NS (nonsignificant) otherwise.
Simulated power results: Small population size, older bottleneck
| No. of cases | No. of controls | Power assoc. | Power IBD25 | Assoc. v. IBD25 | |
|---|---|---|---|---|---|
| 0.0005 | 200 | 200 | 0.71 | 0.55 | Assoc. |
| 0.001 | 400 | 400 | 0.76 | 0.67 | NS |
| 0.002 | 400 | 600 | 0.76 | 0.57 | Assoc. |
| 0.005 | 400 | 1000 | 0.73 | 0.51 | Assoc. |
From an equilibrium population size of N = 10,000, the population was contracted 125 generations ago to a recent effective size of Nrecent = 1000. The selection coefficient, s, used in simulating the equilibrium population is given in the first column. The second and third columns give the sample sizes. The fourth column gives the estimated power of the SNP association test, while the fifth column gives the estimated power of pairwise IBD tests with IBD determined from the final G = 25 generations. All power estimates are from 100 replicates; the standard errors are 0.04–0.05. The sixth column states whether the SNP association or IBD test is more powerful if the difference is significant (two-sided paired t-test P < 0.05) or NS (nonsignificant) otherwise.
Figure 2 Distribution of lengths of detected IBD segments in the WTCCC type 1 diabetes data. IBD segments were detected using BEAGLE fastIBD. Lengths greater than 8 cM are not shown.
Figure 3 Permutation P-values for the IBD test in the WTCCC type 1 diabetes data. P-values were calculated at every tenth marker along the autosomes. The smallest possible P-value from the 5,000,000 permutations (2 × 10−7) is shown by the black horizontal line. The genome-wide significance level determined by 1000 permutations (6 × 10−6) is shown by the blue horizontal line.