| Literature DB >> 22363742 |
Gordon Fehringer1, Geoffrey Liu, Laurent Briollais, Paul Brennan, Christopher I Amos, Margaret R Spitz, Heike Bickeböller, H Erich Wichmann, Angela Risch, Rayjean J Hung.
Abstract
Pathway analysis has been proposed as a complement to single SNP analyses in GWAS. This study compared pathway analysis methods using two lung cancer GWAS data sets based on four studies: one a combined data set from Central Europe and Toronto (CETO); the other a combined data set from Germany and MD Anderson (GRMD). We searched the literature for pathway analysis methods that were widely used, representative of other methods, and had available software for performing analysis. We selected the programs EASE, which uses a modified Fishers Exact calculation to test for pathway associations, GenGen (a version of Gene Set Enrichment Analysis (GSEA)), which uses a Kolmogorov-Smirnov-like running sum statistic as the test statistic, and SLAT, which uses a p-value combination approach. We also included a modified version of the SUMSTAT method (mSUMSTAT), which tests for association by averaging χ(2) statistics from genotype association tests. There were nearly 18000 genes available for analysis, following mapping of more than 300,000 SNPs from each data set. These were mapped to 421 GO level 4 gene sets for pathway analysis. Among the methods designed to be robust to biases related to gene size and pathway SNP correlation (GenGen, mSUMSTAT and SLAT), the mSUMSTAT approach identified the most significant pathways (8 in CETO and 1 in GRMD). This included a highly plausible association for the acetylcholine receptor activity pathway in both CETO (FDR≤0.001) and GRMD (FDR = 0.009), although two strong association signals at a single gene cluster (CHRNA3-CHRNA5-CHRNB4) drive this result, complicating its interpretation. Few other replicated associations were found using any of these methods. Difficulty in replicating associations hindered our comparison, but results suggest mSUMSTAT has advantages over the other approaches, and may be a useful pathway analysis tool to use alongside other methods such as the commonly used GSEA (GenGen) approach.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22363742 PMCID: PMC3283683 DOI: 10.1371/journal.pone.0031816
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Comparison of study designs, selected epidemiologic variables, genotyping platforms and results.
|
| |||||||||
| Study | Type | Case/control | Sex (Males) | Median Age (range) | Ever smokers | CHIP/SNP number | SNPs mapping to genes | # of genes after SNP mapping | # of GO Pathways |
|
| |||||||||
| Central Europe | Cases: hospital Controls: hospital and population | 1926/2522 | 75% | 61 (25–89) | Cases: 93% Controls 65% | Illumina HumanHap300 (317,139) | |||
| Toronto | Cases: hospital Controls: clinic and population | 333/506 | 42% | 58 (20–85) | Cases: 71% Controls: 56% | Illumina HumanHap300 (317,139) | |||
| Central Europe/Toronto combined | Cases: hospital Controls: various | 2258/3027 | 70% | 60 (20–89) | Cases: 89% Controls: 63% | Illumina HumanHap300 305,326 | 161,435 | 17811 | 421 |
|
| |||||||||
| Texas (MD Anderson) | Cases: hospital Controls: clinic | 1154/1137 | 57% | 62 (31–92) | Cases: 100% Controls: 100% | Illumina HumanHap300 317,498 | |||
| Germany (DKFZ/LUCY/KORA) | Cases: hospital Controls: population | 504/484 | 57% | 46 (27–51) | Cases: 93% Controls: 54% | Illumina HumanHap550 561,466 | |||
| Texas Germany combined | Cases: hospital Controls: various | 1639/1618 | 57% | 57 (27–92) | Cases: 98% Controls: 86% | Illumina HumanHap300 302,334 | 160,726 | 17805 | 420 |
After implementing data quality measures described in methods.
Number of significant pathway associations (using FDR< = 0.05) for Central Europe-Toronto (CETO) and Germany-MD Anderson (GRMD) by pathway analysis method.
| Data set | EASE | GenGen | mSUMSTAT | SLAT |
| CETO | 7 | 0 | 8 | 2 |
| GRMD | 5 | 0 | 1 | 3 |
| Both CETO and GRMD | 2 | 0 | 1 | 0 |
| Total | 10 | 0 | 8 | 5 |
Comparison of FDRs (top line) and P-values (in brackets) for Central Europe-Toronto (CETO) and Germany-MD Anderson (GRMD) for top lung cancer risk associated pathways identified by different analysis methods using GO level 4 pathways.
| EASE | GenGen | mSUMSTAT | SLAT | ||||||||
| Go |
| Go |
| Go |
| Go |
| ||||
| Pathway | CETO | GRMD | pathway | CETO | GRMD | pathway | CETO | GRMD | pathway | CETO | GRMD |
|
|
|
|
| 0.194 (0.001) | 0.430 (0.004) |
|
|
|
|
|
|
|
|
|
| immune response | 1.00 (0.735) | 0.271 (0.002) |
|
|
| growth factor activity | 0.2727 | <0.0001 |
|
|
|
|
| 0.316 (0.002) | 0.707 (0.213) |
|
|
| gland development | 0.6215 | 0.0003 |
|
| <0.001 (0.003) | 1.00 (0.065) |
| 1.00 (0.446) | 0.294 (0.002) |
|
|
| glycoprotein metabolic | <0.0001 | 0.1223 |
|
| 1.00 (0.419) | <0.001 (<0.001) | cytokine metabolic | 1.00 (0.278) | 0.314 (0.003) |
|
|
|
| 0.0002 | 0.7325 |
| carboxylic acid transport | <0.001 (0.005) | 0.955 (0.498) |
| 0.381 (0.001) | 0.506 (0.056) |
| 0.036 (0.001) | 0.813 (0.669) |
| 0.0015 | 0.2004 |
|
| 1.00 (0.456) | <0.001 (0.006) |
| 0.361 (0.004) | 0.819 (0.567) |
| 0.034 (0.003) | 0.737 (0.536) | response to steroid hormone | 0.0014 | 0.0418 |
| sensory organ development | 1.00 (0.487) | <0.001 0.002) | somatic recombination | 1.00 (0.308) | 0.358 (0.002) | antigen processing | 0.032 (0.003) | 0.570 (0.280) | regulation of axonogenesis | 0.0383 | 0.0014 |
| phospholipid transporter | <0.001 (<0.001) | 1.00 (0.280) | peptide receptor | 1.00 (0.453) | 0.370 (0.009) | mRNA binding | 0.052 (0.001) | 0.316 (0.051) | retrograde transport, GER | 0.0007 | 0.4841 |
| muscle contraction | <0.001 (<0.001) | 0.394 (0.085) | positive reg Phosphorous | 1.00 (0.639) | 0.380 (0.018) | anion transport | 0.055 (<0.001) | 0.735 (0.557) | fatty acid regulation | 0.0009 | 0.4935 |
Bold: significant after adjustment for multiple comparisons (FDR≤0.05) in both CETO and GRMD.
Bold with italics: significant after adjustment for multiple comparisons in one data set (FDR≤0.05), nominal significance in other (P≤0.05).
Underline: Top pathways identified by more than one pathway analysis method within a data set.
Abbreviated GO category name. Full category names as follows: Ras-GEF: Ras guanyl-nucleotide exchange factor; LDL binding: low-density lipoprotein binding; acetylcholine receptor: acetylcholine receptor activity; immune response: adaptive immune response based on somatic recombination of immune receptors built from immunoglobulin superfamily domains; complement activation: complement activation, classical pathway; somatic recombination: somatic recombination of immunoglobulin gene segments; peptide receptor: G-protein coupled peptide receptor activity; positive reg phosphorous: positive regulation of phosphorus metabolic process; antigen processing: antigen processing and presentation of peptide antigen via MHC Class I; retrograde transport, GER: Retrograde vesicle mediated transport, golgi to endoplasmic reticulum.
Significant based on Benjamini-Hochberg FDR calculation.
Figure 1Comparison of odds ratios for acetylcholine receptor pathway showing.
A) the most significant SNP for each gene used in Central Europe-Toronto analysis and odds ratios for same SNPs for Germany MD Anderson); B) the most significant SNP assigned to each gene in either data set (i.e., the actual SNPs used in pathway analyses in the two data sets). Chromosome number (Chr) and genes for both graphs are shown on left. (Central Europe – Toronto SNPs: solid fill, Germany MD Anderson matching SNPs: no fill; Germany MD Anderson top SNP (differing from Central Europe-Toronto): grey fill). A) Reference allele same in both Central Europe-Toronto and Germany-MD Anderson but chosen to show positive association for Central Europe-Toronto. B) Reference allele always chosen to show positive association. CHRNA5 is excluded as SNPs are identical to those representing CHRNA3. Odds ratios adjusted for age, sex and country of study.