| Literature DB >> 23152854 |
Alexandra Mirina1, Gil Atzmon, Kenny Ye, Aviv Bergman.
Abstract
In this work we show that in genome-wide association studies (GWAS) there is a strong bias favoring of genes covered by larger numbers of SNPs. Thus, we state here that there is a need for correction for such bias when performing downstream gene-level analysis, e.g. pathway analysis and gene-set analysis. We investigate several methods of obtaining gene level statistical significance in GWAS, and compare their effectiveness in correcting such bias. We also propose a simple algorithm based on first order statistic that corrects such bias.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23152854 PMCID: PMC3494661 DOI: 10.1371/journal.pone.0049093
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Distribution of the number of SNPs in log2.
TRG – solid line, all genes – dashed line.
Top categories in Biological Process (GO:0008150) according to Gene Ontology classification system.
| GO number | Category name | Number of genes |
| GO:0000003 | reproduction | 1169 |
| GO:0001906 | cell killing | 64 |
| GO:0002376 | immune system process | 1546 |
| GO:0006791 | sulfur utilization | 0 |
| GO:0006794 | phosphorus utilization | 0 |
| GO:0008152 | metabolic process | 8662 |
| GO:0008283 | cell proliferation | 1360 |
| GO:0009758 | carbohydrate utilization | 2 |
| GO:0009987 | cellular process | 12145 |
| GO:0015976 | carbon utilization | 0 |
| GO:0016032 | viral reproduction | 432 |
| GO:0016265 | death | 1573 |
| GO:0019740 | nitrogen utilization | 1 |
| GO:0022414 | reproductive process | 1165 |
| GO:0022610 | biological adhesion | 884 |
| GO:0023052 | signaling | 4174 |
| GO:0032501 | multicellular organismal process | 5182 |
| GO:0032502 | developmental process | 4094 |
| GO:0040007 | growth | 705 |
| GO:0040011 | locomotion | 1112 |
| GO:0043473 | pigmentation | 52 |
| GO:0048511 | rhythmic process | 187 |
| GO:0048518 | positive regulation of biological process | 2973 |
| GO:0048519 | negative regulation of biological process | 2710 |
| GO:0050789 | regulation of biological process | 7611 |
| GO:0050896 | response to stimulus | 5982 |
| GO:0051179 | localization | 3911 |
| GO:0051234 | establishment of localization | 3253 |
| GO:0051704 | multi-organism process | 963 |
| GO:0065007 | biological regulation | 8045 |
| GO:0071840 | cellular component organization or biogenesis | 3755 |
Figure 2Box-plots of regression coefficients from 10 simulated data with random disease status.
The regression coefficients are obtained by regressing the log of gene-level significance p-value on the number of markers per gene.
Figure 3P-values for the linear regression model that regress the log of gene-level significance p-value on the number of markers per gene.
Plotted are the results from 10 simulated data with random disease status.
Summary of the methods.
| Simes test | Fisher test | GATES | VEGAS | FOSCO | |
|
|
|
|
|
| – |
|
| Adjust p-values by | Obtain p-value based on | Adjust p-values by | Obtain p-values base on | Adjust p-values by |
|
| 2.32E-04 | −8.32E-03 | −3.14E-04 | −1.00E-04 | −3.75E-04 |
|
| NCBI dbSNPs | NCBI dbSNPs | NCBI dbSNPs+5kb flanking regions | UCSG Genome Brower hg18+50 kb flanking regions | NCBI dbSNPs |
Note: * Linear regression coefficient before correction β = −7.23E-03.