| Literature DB >> 19801980 |
Kevin B Jacobs, Meredith Yeager, Sholom Wacholder, David Craig, Peter Kraft, David J Hunter, Justin Paschal, Teri A Manolio, Margaret Tucker, Robert N Hoover, Gilles D Thomas, Stephen J Chanock, Nilanjan Chatterjee.
Abstract
Aggregate results from genome-wide association studies (GWAS), such as genotype frequencies for cases and controls, were until recently often made available on public websites because they were thought to disclose negligible information concerning an individual's participation in a study. Homer et al. recently suggested that a method for forensic detection of an individual's contribution to an admixed DNA sample could be applied to aggregate GWAS data. Using a likelihood-based statistical framework, we developed an improved statistic that uses genotype frequencies and individual genotypes to infer whether a specific individual or any close relatives participated in the GWAS and, if so, what the participant's phenotype status is. Our statistic compares the logarithm of genotype frequencies, in contrast to that of Homer et al., which is based on differences in either SNP probe intensity or allele frequencies. We derive the theoretical power of our test statistics and explore the empirical performance in scenarios with varying numbers of randomly chosen or top-associated SNPs.Entities:
Mesh:
Year: 2009 PMID: 19801980 PMCID: PMC2803072 DOI: 10.1038/ng.455
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330
Theoretical power of Tgeno to detect an individual in the Test group.
| a. 1,000 samples/group, number of loci varied | Power of Tgeno at Significance Level | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Test group | Ref group | Ind. SNPs | Error rate | MAF | 0.05 | 0.01 | 0.001 | 10−4 | 10−5 | 10−6 |
| 1,000 | 1,000 |
| 0.00 | 0.25 | 0.62 | 0.38 | 0.15 | 0.05 | 0.02 | 0.00 |
| 1,000 | 1,000 |
| 0.00 | 0.25 | 0.89 | 0.73 | 0.45 | 0.24 | 0.11 | 0.04 |
| 1,000 | 1,000 |
| 0.00 | 0.25 | 1.00 | 0.99 | 0.96 | 0.86 | 0.72 | 0.54 |
| 1,000 | 1,000 |
| 0.00 | 0.25 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.99 |
| 1,000 | 1,000 |
| 0.00 | 0.25 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Scenarios examined: (a) Hypothetical GWAS with 1,000 cases (test group) and 1,000 controls (reference group), no genotyping error, and 5,000 to 200,000 independent SNPs with fixed MAF of 25%. The upper bound on the number of SNPs was chosen based on estimates (unpublished) that the Illumina HumanHap550 assay provides information equivalent to ~200,000-300,000 independent SNPs in populations of European descent. A MAF of 25% was chosen based on a survey of several fixed-content assays which were found to have average MAFs ranging from 20% to 25% in populations of European descent. (b) As for (a) with 5,000 cases and 5,000 controls. (c) As for (a) with genotype discordance rates from 0% to 25% and 50,000 independent loci. (d) As for (a) with varying MAF from 5% to 50% for 25,000 independent loci. (e) For varying sizes of the reference group from 60 (the size of the HapMap CEU founder population) to 10,000 (near perfect estimation of the genotype frequencies) for a fixed MAF of 10%.
Theoretical power of Tgeno to detect a relative of an individual in the Test group.
| f. 1,000 samples/group, detection of a parent/offspring | Power of Tgeno at Significance Level | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Test group | Ref group | Ind. SNPs | Parent/ Offspring | MAF | 0.05 | 0.01 | 0.001 | 10−4 | 10−5 | 10−6 |
| 1,000 | 1,000 | 200,000 | 0.10 |
| 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 1,000 | 1,000 | 200,000 | 0.18 |
| 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 1,000 | 1,000 | 200,000 | 0.32 |
| 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.99 |
| 1,000 | 1,000 | 200,000 | 0.42 |
| 1.00 | 1.00 | 0.97 | 0.91 | 0.79 | 0.63 |
| 1,000 | 1,000 | 200,000 | 0.48 |
| 0.98 | 0.92 | 0.77 | 0.55 | 0.35 | 0.20 |
| 1,000 | 1,000 | 200,000 | 0.50 |
| 0.93 | 0.82 | 0.58 | 0.35 | 0.19 | 0.09 |
Scenarios examined: (f) Hypothetical GWAS data with 1,000 cases (test group) and 1,000 controls (reference group) where the individual tested is the parent or offspring of a case group, 200,000 independent SNPs, and MAF from 5% to 50%. (g) As for Table 1 (a) except the individual tested is a sibling of a single member of the test group, 200,000 independent loci, and MAF from 5% to 50%.
Figure 1Histogram of Tgeno for a GWAS with 1,000 cases and controls
The figure presents data using 1,000 cases (group 1 in red), 1,000 controls (group 2 in blue) and 1,000 subjects not in the study based on genotypes from Illumina HumanHap550 assay. The theoretical null density curve is shown in black.
Figure 2Histograms of calibrated Tgeno and Homer’s Tallele with 1,000 cases and controls and varying numbers of SNPs
The figure presents theoretical null density curves (black) for a GWAS with 1,000 cases (group 1 in red), 1,000 controls (group 2 in blue) and 12,000 subjects not in the study (in gray) using genotypes for (a) 10,000, (b) 100,000, and (c) 550,000 top associated SNPs from the Illumina HumanHap550 assay. Statistics were calibrated so that the null distribution was centered at zero with unit variance.
Figure 3Sensitivity and specificity of Tgeno applied to GWAS data
Log-scale Receiver Operating Characteristic (ROC) curves of Tgeno with Illumina HumanHap550 data from GWAS scenarios with 1000/1000 and 5000/5000 cases and controls of European descent.