| Literature DB >> 23105940 |
Ji-Sun Kwon1, Jihye Kim, Dougu Nam, Sangsoo Kim.
Abstract
Gene set analysis (GSA) is useful in interpreting a genome-wide association study (GWAS) result in terms of biological mechanism. We compared the performance of two different GSA implementations that accept GWAS p-values of single nucleotide polymorphisms (SNPs) or gene-by-gene summaries thereof, GSA-SNP and i-GSEA4GWAS, under the same settings of inputs and parameters. GSA runs were made with two sets of p-values from a Korean type 2 diabetes mellitus GWAS study: 259,188 and 1,152,947 SNPs of the original and imputed genotype datasets, respectively. When Gene Ontology terms were used as gene sets, i-GSEA4GWAS produced 283 and 1,070 hits for the unimputed and imputed datasets, respectively. On the other hand, GSA-SNP reported 94 and 38 hits, respectively, for both datasets. Similar, but to a lesser degree, trends were observed with Kyoto Encyclopedia of Genes and Genomes (KEGG) gene sets as well. The huge number of hits by i-GSEA4GWAS for the imputed dataset was probably an artifact due to the scaling step in the algorithm. The decrease in hits by GSA-SNP for the imputed dataset may be due to the fact that it relies on Z-statistics, which is sensitive to variations in the background level of associations. Judicious evaluation of the GSA outcomes, perhaps based on multiple programs, is recommended.Entities:
Keywords: GSA-SNP; gene set analysis; genome-wide association study; i-GSEA4GWAS; imputation
Year: 2012 PMID: 23105940 PMCID: PMC3480679 DOI: 10.5808/GI.2012.10.2.123
Source DB: PubMed Journal: Genomics Inform ISSN: 1598-866X
The number of gene set hits identified by gene set analyses
GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; GSA, gene set analysis; SNP, single nucleotide polymorphism.
aEither the best or second-best p-value of SNPs residing inside or within 20 kb of the gene boundary was assigned to each gene as the score. Unlike i-GSEA4GWAS, which assigns the best p-value, GSA-SNP has an option to assign the second-best p-value.
Fig. 1Comparison of the gene scores calculated by two different schemes. All 15,829 genes that were mapped by at least two single nucleotide polymorphisms (SNPs) (either genotyped or imputed) were included in the high-volume scatter plot that displays the local density of points by a false color representation. For a given gene, the p-values of SNPs located inside or within 20 kb of the gene boundary were surveyed. The best (or second-best) of their-log-transformed values was assigned as the gene score. The X and Y axes represent the second-best and best values, respectively.
Fig. 2Comparison of the gene scores from the unimputed and imputed datasets. All 15,829 genes that were mapped by at least two single nucleotide polymorphisms (SNPs) (either genotyped or imputed) were included in the high-volume scatter plot that displays the local density of points by a false color representation. For a given gene, the p-values of SNPs located inside or within 20 kb of the gene boundary were surveyed. The best of their-log-transformed values was assigned as the gene score. The X and Y axes represent the gene scores from the unimputed and imputed datasets, respectively.