| Literature DB >> 19344520 |
High-Seng Chai1, Hugues Sicotte, Kent R Bailey, Stephen T Turner, Yan W Asmann, Jean-Pierre A Kocher.
Abstract
BACKGROUND: The developments of high-throughput genotyping technologies, which enable the simultaneous genotyping of hundreds of thousands of single nucleotide polymorphisms (SNP) have the potential to increase the benefits of genetic epidemiology studies. Although the enhanced resolution of these platforms increases the chance of interrogating functional SNPs that are themselves causative or in linkage disequilibrium with causal SNPs, commonly used single SNP-association approaches suffer from serious multiple hypothesis testing problems and provide limited insights into combinations of loci that may contribute to complex diseases. Drawing inspiration from Gene Set Enrichment Analysis developed for gene expression data, we have developed a method, named GLOSSI (Gene-loci Set Analysis), that integrates prior biological knowledge into the statistical analysis of genotyping data to test the association of a group of SNPs (loci-set) with complex disease phenotypes. The most significant loci-sets can be used to formulate hypotheses from a functional viewpoint that can be validated experimentally.Entities:
Mesh:
Year: 2009 PMID: 19344520 PMCID: PMC2678095 DOI: 10.1186/1471-2105-10-102
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Parameter specification in the simulated examples
| Scenario | Number of causal SNPs | RR (OR)* | MAF† of causal SNPs | Whether causal SNPs were 'genotyped' |
| 1–3 | 1, 5 or 20 | 1.07 (1.10) | 0.25 | Yes |
| 4–6 | 1, 5 or 20 | 1.34 (1.52) | 0.05 | Yes |
| 7–9 | 1, 5 or 20 | 1.34 (1.50) | 0.25 | Yes |
| 10–12 | 1, 5, or 20 | 1.61 (2.00) | 0.05 | Yes |
| 13–15 | 1, 5, or 20 | 1.61 (1.94) | 0.25 | Yes |
| 16–18 | 1 or 5 | 2.00 (2.67) | 0.25 | Yes |
| 19–21 | 1, 5, or 17‡ | 1.61 (~1.94) | ~0.25 | No |
*RR (OR) = relative risks (odds ratio) when a loci carries two disease alleles, assuming an additive model; †MAF = minor allele frequency of the Utah samples in the HapMap project. ‡Three out of the 20 original causal SNPs, generated under the case where MAF equals 0.25, are not in high LD with any SNP genotyped in the HapMap phase I/II project. Disease prevalence and crossover rate are fixed at 25% and 1.0 centiMorgan, respectively, in all simulations.
Figure 1Histogram of p-values acquired under the null hypothesis (Scenario 0) based on 1000 simulated data sets of 200 cases and 200 controls. The dashed line is the expected theoretical height of a bar if no SNP in the loci-set was related to the case-control labels.
Estimated type I error rates for GLOSSI in the simulated examples
| Total sample size | Nominal rate, | Proportion of p-value < |
| 400 | 0.05 | 0.057 |
| 0.01 | 0.011 | |
| 0.001 | 0 | |
| 1200 | 0.05 | 0.045 |
| 0.01 | 0.012 | |
| 0.001 | 0.002 | |
| 2000 | 0.05 | 0.049 |
| 0.01 | 0.010 | |
| 0.001 | 0.003 | |
Figure 2Statistical power estimated using 200 cases and 200 controls across a range of experimental settings. x-y coordinates of the numbers within the plot represent number of causal SNPs and power respectively for individual simulated examples. Relative risks (RR) are denoted by the numbers themselves. Cases with the same RR value and MAF of 0.25 are linked using solid lines while those having MAF of 0.05 are joined by dashed lines. The lines are colored grey if causal SNPs were not genotyped; black otherwise.
Figure 3Plot of power versus sample size. Only scenarios surpassing 80% power in the case of 2000 samples are illustrated, except for Scenarios 20 and 21 where their curves closely resemble those from Scenarios 14 and 15. Integers within the plot denote the scenario number (see Table 1).
Estimated type I error rates for the modified KS statistic
| Type of permutation | Relative size of genes: out/in loci-set | Proportion of p-value < 0.05 | 95% CI |
| Phenotype | 50% | 0.052 | (0.038,0.066) |
| 100% | 0.053 | (0.039,0.067) | |
| 200% | 0.051 | (0.037,0.065) | |
| Gene | 50% | 0.27 | (0.24,0.29) |
| 100% | 0.046 | (0.033,0.059) | |
| 200% | 0.004 | (0,0.008) | |
Power of the modified KS statistic when two genes in the reference set consist of a causal SNP
| Number of genes outside of loci-set | Relative size of genes: out/in loci-set | Proportion of p-value < 0.05 | |
| Scenario 17 | Scenario 18 | ||
| 50% | 0.464 | 0.842 | |
| 176 | 100% | 0.581 | 0.854 |
| 200% | 0.684 | 0.860 | |
| 50% | 0.371 | 0.804 | |
| 88 | 100% | 0.495 | 0.829 |
| 200% | 0.629 | 0.854 | |
| 50% | 0.145 | 0.683 | |
| 44 | 100% | 0.187 | 0.732 |
| 200% | 0.161 | 0.773 | |
Loci-sets with unadjusted p-value no greater than 0.1% in the antihypertensive response example
| Loci-set | MsigDB ID | No. SNP | No. relevant gene | p-value | q-value |
| TPO signaling pathway | c2:338 | 48 | 10 | 0.0001 | 0.035 |
| Erk1/Erk2 Mapk signaling pathway | c2:178 | 74 | 16 | 0.0001 | 0.035 |
| Sprouty regulation of tyrosine kinase signals | c2:316 | 36 | 10 | 0.0001 | 0.035 |
| Multiple antiapoptotic pathways from IGF-1R signaling lead to bad phosphorylation | c2:214 | 24 | 8 | 0.0002 | 0.035 |
| PTEN pathway | c2:557 | 30 | 8 | 0.0002 | 0.035 |
| Transcription factor CREB and its extracellular signals | c2:152 | 83 | 16 | 0.0002 | 0.035 |
| Growth hormone signaling pathway | c2:198 | 50 | 11 | 0.0002 | 0.035 |
| PTEN dependent cell cycle arrest and apoptosis | c2:292 | 24 | 8 | 0.0003 | 0.035 |
| Upregulated in acute rejection transplanted kidney biopsies | c2:834 | 132 | 25 | 0.0003 | 0.035 |
| IL 3 signaling pathway | c2:223 | 18 | 6 | 0.0003 | 0.035 |
| Trka receptor signaling pathway | c2:339 | 41 | 5 | 0.0003 | 0.035 |
| IL-2 receptor beta chain in T cell activation | c2:222 | 40 | 11 | 0.0004 | 0.035 |
| B cell antigen receptor | c2:569 | 49 | 18 | 0.0004 | 0.035 |
| IL 4 receptor signaling in B lymphocytes | c2:563 | 39 | 12 | 0.0004 | 0.035 |
| Calcium signaling by HBx of Hepatitis B virus | c2:569 | 16 | 4 | 0.0005 | 0.035 |
| Glycogen processing | c2:602 | 39 | 8 | 0.0005 | 0.035 |
| IGF-1 signaling pathway | c2:213 | 31 | 9 | 0.0005 | 0.035 |
| Down regulated following Apc loss | c2:1048 | 156 | 32 | 0.0005 | 0.035 |
| Liver selective | c2:979 | 300 | 104 | 0.0005 | 0.035 |
| TrkA receptor | c2:559 | 19 | 6 | 0.0006 | 0.035 |
| Inhibition of cellular proliferation by gleevec | c2:199 | 37 | 10 | 0.0006 | 0.035 |
| IL 6 signaling pathway | c2:226 | 25 | 8 | 0.0006 | 0.035 |
| Insulin signaling pathway | c2:229 | 26 | 8 | 0.0007 | 0.039 |
| Upregulated in fibroblasts following infection with human cytomegalovirus | c2:1269 | 131 | 24 | 0.0008 | 0.040 |
| Down regulated by both curcumin and sulindac in SW260 colon carcinoma cells | c2:1412 | 50 | 10 | 0.0010 | 0.047 |
| Upregulated by TPA in resistant HL-525 cells | c2:1679 | 90 | 19 | 0.0010 | 0.048 |
| Upregulated by UV-B light in epidermal keratinocytes | c2:1717 | 55 | 12 | 0.0004 | 0.56 |
| Upregulated in well functioning transplanted kidney biopsies | c2:836 | 1347 | 285 | 0.0009 | 0.63 |