| Literature DB >> 27362418 |
Han Zhang1, William Wheeler2, Paula L Hyland1, Yifan Yang3, Jianxin Shi1, Nilanjan Chatterjee4,5, Kai Yu1.
Abstract
Meta-analysis of multiple genome-wide association studies (GWAS) has become an effective approach for detecting single nucleotide polymorphism (SNP) associations with complex traits. However, it is difficult to integrate the readily accessible SNP-level summary statistics from a meta-analysis into more powerful multi-marker testing procedures, which generally require individual-level genetic data. We developed a general procedure called Summary based Adaptive Rank Truncated Product (sARTP) for conducting gene and pathway meta-analysis that uses only SNP-level summary statistics in combination with genotype correlation estimated from a panel of individual-level genetic data. We demonstrated the validity and power advantage of sARTP through empirical and simulated data. We conducted a comprehensive pathway-based meta-analysis with sARTP on type 2 diabetes (T2D) by integrating SNP-level summary statistics from two large studies consisting of 19,809 T2D cases and 111,181 controls with European ancestry. Among 4,713 candidate pathways from which genes in neighborhoods of 170 GWAS established T2D loci were excluded, we detected 43 T2D globally significant pathways (with Bonferroni corrected p-values < 0.05), which included the insulin signaling pathway and T2D pathway defined by KEGG, as well as the pathways defined according to specific gene expression patterns on pancreatic adenocarcinoma, hepatocellular carcinoma, and bladder carcinoma. Using summary data from 8 eastern Asian T2D GWAS with 6,952 cases and 11,865 controls, we showed 7 out of the 43 pathways identified in European populations remained to be significant in eastern Asians at the false discovery rate of 0.1. We created an R package and a web-based tool for sARTP with the capability to analyze pathways with thousands of genes and tens of thousands of SNPs.Entities:
Mesh:
Year: 2016 PMID: 27362418 PMCID: PMC4928884 DOI: 10.1371/journal.pgen.1006122
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Empirical sizes of the sARTP, MsARTP, and MsARTP-u procedures.
| Reference | Size | |||||
|---|---|---|---|---|---|---|
| 0.05 | 0.01 | 0.005 | 0.001 | 0.0005 | ||
| sARTP | External | 0.050 | 0.0093 | 0.0040 | 0.00078 | 0.00044 |
| Internal | 0.046 | 0.0087 | 0.0042 | 0.00074 | 0.00040 | |
| MsARTP | External | 0.048 | 0.0093 | 0.0040 | 0.00076 | 0.00041 |
| Internal | 0.048 | 0.0084 | 0.0042 | 0.00074 | 0.00041 | |
| MsARTP-u | External | 0.082 | 0.018 | 0.0081 | 0.0013 | 0.00064 |
| Internal | 0.094 | 0.022 | 0.011 | 0.0016 | 0.00081 | |
Empirical sizes are estimated based on 500,000 datasets simulated from the GERA data.
Using 503 European samples from the 1000 Genomes Project as an external reference;
Using 500 samples from the GERA data as an internal reference.
Power comparisons under the type I error rate of 0.05 when analyzing data from three studies.
| Internal reference | External reference | ||||||
|---|---|---|---|---|---|---|---|
| sARTP | MsARTP | Fisher | sARTP | MsARTP | Fisher | ||
| 0.3 | 5 | 0.165 | 0.170 | 0.110 | 0.170 | 0.167 | 0.105 |
| 10 | 0.405 | 0.402 | 0.229 | 0.399 | 0.401 | 0.221 | |
| 15 | 0.573 | 0.578 | 0.334 | 0.564 | 0.561 | 0.323 | |
| 0.4 | 5 | 0.292 | 0.293 | 0.162 | 0.295 | 0.297 | 0.154 |
| 10 | 0.642 | 0.637 | 0.363 | 0.640 | 0.635 | 0.362 | |
| 15 | 0.858 | 0.858 | 0.574 | 0.855 | 0.856 | 0.561 | |
For every pair of and , the empirical powers are computed from 1,000 simulated datasets at the level of 0.05. Each dataset contains three studies. The pathway consists of 50 independent genes, each with 20 SNPs. Fisher’s method is used to combine the three pathway p-values obtained by applying sARTP to the SNP-level summary data from each of three studies separately.
The theoretical power of the single-locus trend test on the functional SNP under the type I error rate of 0.05, given the sample sizes of cases and controls, and the MAF of the functional SNP;
The number of genes including the functional SNPs;
Using 500 samples from the GERA data as an internal reference;
Using 503 European samples from the 1000 Genomes Project as an external reference.
Fig 1Comparisons of p-values from three types of pathway analyses on the GERA data.
Based on the GERA data, 4,713 pathways are analyzed in three different ways. Pathway p-values obtained by ARTP using the GERA individual-level genetic data (x-axis) are compared with the ones obtained by sARTP using summary statistics in combination with the internal reference panel that consists of 500 randomly selected GERA samples (left), and the ones using the summary statistics in combination with the external reference panel that consists of 503 European subjects from the 1000 Genomes Project (right).
Fig 2Q-Q plots of gene-level and pathway p-values based on the sARTP procedure on the DIAGRAM study, the GERA study, and the two studies combined.
(Left) Q-Q plots of gene-level p-values on 15,946 genes based on the sARTP gene-based analysis of the DIAGRAM study (DIAGRAM), the GERA study (GERA), and the two studies combined (META). (Right) Q-Q plots of pathway p-values on 4,713 pathways based on the sARTP pathway analysis of the DIAGRAM study (DIAGRAM), the GERA study (GERA), and the two studies combined (META).
Summary of 43 significant pathways detected by the pathway meta-analysis based on the DIAGRAM and GERA studies.
| Pathway | META | DIAGRAM | GERA |
|---|---|---|---|
| SCHLOSSER_SERUM_RESPONSE_UP | 2.50E-08 | 2.92E-04 | 1.77E-03 |
| PENG_RAPAMYCIN_RESPONSE_DN | 1.50E-07 | 1.68E-03 | 2.08E-04 |
| YAGI_AML_WITH_T_8_21_TRANSLOCATION | 1.50E-07 | 4.33E-03 | 4.46E-04 |
| PATIL_LIVER_CANCER | 2.00E-07 | 4.35E-05 | 3.89E-03 |
| PUJANA_CHEK2_PCC_NETWORK | 2.00E-07 | 1.18E-02 | 3.39E-03 |
| STEIN_ESRRA_TARGETS | 2.00E-07 | 9.37E-04 | 8.39E-04 |
| STEIN_ESRRA_TARGETS_UP | 3.00E-07 | 6.38E-03 | 1.02E-04 |
| WANG_CISPLATIN_RESPONSE_AND_XPC_UP | 4.00E-07 | 6.75E-03 | 1.18E-01 |
| CADWELL_ATG16L1_TARGETS_DN | 4.35E-07 | 1.45E-03 | 9.59E-03 |
| SONG_TARGETS_OF_IE86_CMV_PROTEIN | 5.30E-07 | 7.88E-04 | 3.20E-05 |
| CASORELLI_ACUTE_PROMYELOCYTIC_LEUKEMIA_DN | 5.50E-07 | 2.48E-02 | 5.70E-03 |
| RIZ_ERYTHROID_DIFFERENTIATION | 6.15E-07 | 2.33E-02 | 2.31E-02 |
| BORCZUK_MALIGNANT_MESOTHELIOMA_UP | 6.50E-07 | 7.61E-03 | 2.52E-02 |
| HILLION_HMGA1_TARGETS | 9.00E-07 | 3.39E-01 | 1.10E-05 |
| KEGG_MATURITY_ONSET_DIABETES_OF_THE_YOUNG | 1.14E-06 | 1.68E-02 | 3.58E-04 |
| HOLLEMAN_ASPARAGINASE_RESISTANCE_B_ALL_DN | 1.22E-06 | 3.42E-02 | 1.67E-03 |
| PUJANA_BRCA1_PCC_NETWORK | 1.50E-06 | 1.69E-02 | 5.11E-03 |
| HOSHIDA_LIVER_CANCER_SUBCLASS_S3 | 1.95E-06 | 6.14E-03 | 5.83E-03 |
| GRAESSMANN_APOPTOSIS_BY_DOXORUBICIN_DN | 2.00E-06 | 9.57E-04 | 6.41E-04 |
| REACTOME_REGULATION_OF_BETA_CELL_DEVELOPMENT | 2.26E-06 | 4.81E-02 | 1.15E-03 |
| PUJANA_BREAST_CANCER_WITH_BRCA1_MUTATED_UP | 2.48E-06 | 7.61E-03 | 2.39E-02 |
| BLALOCK_ALZHEIMERS_DISEASE_UP | 2.50E-06 | 3.52E-02 | 3.26E-02 |
| GOBERT_OLIGODENDROCYTE_DIFFERENTIATION_UP | 2.85E-06 | 3.68E-02 | 5.37E-04 |
| MCBRYAN_PUBERTAL_BREAST_4_5WK_DN | 2.95E-06 | 2.20E-01 | 1.10E-05 |
| REACTOME_REGULATION_OF_GENE_EXPRESSION_IN_BETA_CELLS | 3.11E-06 | 2.81E-02 | 2.33E-03 |
| SANSOM_APC_TARGETS_DN | 3.25E-06 | 3.08E-01 | 2.84E-03 |
| NABA_MATRISOME | 3.45E-06 | 4.46E-02 | 1.30E-02 |
| PUJANA_BRCA2_PCC_NETWORK | 4.65E-06 | 1.25E-02 | 6.32E-02 |
| KEGG_TYPE_II_DIABETES_MELLITUS | 4.85E-06 | 6.38E-02 | 9.93E-04 |
| LINDGREN_BLADDER_CANCER_CLUSTER_1_DN | 5.50E-06 | 5.20E-02 | 4.36E-03 |
| ROPERO_HDAC2_TARGETS | 6.04E-06 | 2.11E-02 | 1.41E-03 |
| KEGG_INSULIN_SIGNALING_PATHWAY | 6.20E-06 | 2.65E-02 | 4.26E-03 |
| CHEN_PDGF_TARGETS | 6.36E-06 | 2.48E-03 | 1.04E-02 |
| REACTOME_INTEGRATION_OF_ENERGY_METABOLISM | 6.50E-06 | 6.11E-02 | 1.06E-04 |
| PETROVA_ENDOTHELIUM_LYMPHATIC_VS_BLOOD_UP | 6.60E-06 | 1.79E-01 | 8.72E-03 |
| REACTOME_PPARA_ACTIVATES_GENE_EXPRESSION | 6.70E-06 | 4.97E-02 | 1.69E-02 |
| AGUIRRE_PANCREATIC_CANCER_COPY_NUMBER_UP | 6.90E-06 | 1.73E-01 | 1.97E-04 |
| DACOSTA_UV_RESPONSE_VIA_ERCC3_UP | 7.45E-06 | 2.20E-02 | 4.25E-02 |
| TOYOTA_TARGETS_OF_MIR34B_AND_MIR34C | 8.10E-06 | 1.20E-01 | 2.67E-03 |
| HOLLEMAN_ASPARAGINASE_RESISTANCE_ALL_DN | 8.43E-06 | 5.90E-02 | 3.22E-03 |
| REACTOME_CLASS_I_MHC_MEDIATED_ANTIGEN_PROCESSING_PRESENTATION | 8.45E-06 | 5.67E-02 | 2.00E-02 |
| DODD_NASOPHARYNGEAL_CARCINOMA_DN | 1.00E-05 | 2.86E-03 | 1.71E-04 |
| REACTOME_MEMBRANE_TRAFFICKING | 1.04E-05 | 6.43E-03 | 4.52E-04 |
The 43 pathways are identified among 4,713 candidate pathways for having their pathway meta-analysis p-values less than the <1.06×10−5, the Bonferroni correction threshold.
P-values based on summary statistics combined from the DIAGRAM and GERA studies;
P-values based on summary statistics from the DIAGRAM study;
P-values based on summary statistics from the GERA study;
Pathways that do not contain genes in the 17q21 region;
Pathways that contain at least one gene in the 17q21 region;
Pathways that remain globally significant after excluding genes in the 17q21 region.
Fig 3Heat map of gene-level p-values on selected genes within 43 significant pathways based on the DIAGRAM and GERA studies.
There are 46 unique genes in the 43 significant pathways that have their gene-level meta-analysis p-values less than 0.001. Each row in the plot represents one of 43 significant pathways. Each column represents one of the 46 unique genes. The chromosome IDs of 46 unique genes are given in parentheses. The color of each cell represents the gene-level p-value (in the −log10 scale). A cell for a gene that is not included in a pathway is colored gray in the corresponding entry. The orders of genes (x-axis) and pathways (y-axis) are arranged according to their gene and pathway meta-analysis p-values.
Pathway p-values and FDR adjusted p-values based on the AGEN-T2D study.
| Pathway | P-value | FDR |
|---|---|---|
| PATIL_LIVER_CANCER | 0.0014 | 0.029 |
| CADWELL_ATG16L1_TARGETS_DN | 0.0023 | 0.029 |
| HOSHIDA_LIVER_CANCER_SUBCLASS_S3 | 0.0025 | 0.029 |
| GOBERT_OLIGODENDROCYTE_DIFFERENTIATION_UP | 0.0027 | 0.029 |
| DACOSTA_UV_RESPONSE_VIA_ERCC3_UP | 0.011 | 0.074 |
| CHEN_PDGF_TARGETS | 0.011 | 0.074 |
| REACTOME_MEMBRANE_TRAFFICKING | 0.012 | 0.074 |
| AGUIRRE_PANCREATIC_CANCER_COPY_NUMBER_UP | 0.026 | 0.14 |
| MCBRYAN_PUBERTAL_BREAST_4_5WK_DN | 0.041 | 0.19 |
| LINDGREN_BLADDER_CANCER_CLUSTER_1_DN | 0.043 | 0.19 |
| PUJANA_CHEK2_PCC_NETWORK | 0.057 | 0.21 |
| BLALOCK_ALZHEIMERS_DISEASE_UP | 0.059 | 0.21 |
| SCHLOSSER_SERUM_RESPONSE_UP | 0.085 | 0.27 |
| RIZ_ERYTHROID_DIFFERENTIATION | 0.089 | 0.27 |
| DODD_NASOPHARYNGEAL_CARCINOMA_DN | 0.097 | 0.28 |
| TOYOTA_TARGETS_OF_MIR34B_AND_MIR34C | 0.10 | 0.28 |
| REACTOME_CLASS_I_MHC_MEDIATED_ANTIGEN_PROCESSING_PRESENTATION | 0.14 | 0.32 |
| PUJANA_BRCA2_PCC_NETWORK | 0.14 | 0.32 |
| WANG_CISPLATIN_RESPONSE_AND_XPC_UP | 0.14 | 0.32 |
| STEIN_ESRRA_TARGETS | 0.16 | 0.35 |
| PUJANA_BREAST_CANCER_WITH_BRCA1_MUTATED_UP | 0.17 | 0.35 |
| GRAESSMANN_APOPTOSIS_BY_DOXORUBICIN_DN | 0.18 | 0.36 |
| PUJANA_BRCA1_PCC_NETWORK | 0.23 | 0.43 |
| HOLLEMAN_ASPARAGINASE_RESISTANCE_B_ALL_DN | 0.35 | 0.62 |
| PENG_RAPAMYCIN_RESPONSE_DN | 0.38 | 0.62 |
| ROPERO_HDAC2_TARGETS | 0.38 | 0.62 |
| KEGG_MATURITY_ONSET_DIABETES_OF_THE_YOUNG | 0.39 | 0.62 |
| HILLION_HMGA1_TARGETS | 0.43 | 0.65 |
| HOLLEMAN_ASPARAGINASE_RESISTANCE_ALL_DN | 0.44 | 0.65 |
| PETROVA_ENDOTHELIUM_LYMPHATIC_VS_BLOOD_UP | 0.45 | 0.65 |
| REACTOME_PPARA_ACTIVATES_GENE_EXPRESSION | 0.52 | 0.72 |
| REACTOME_REGULATION_OF_GENE_EXPRESSION_IN_BETA_CELLS | 0.55 | 0.74 |
| KEGG_INSULIN_SIGNALING_PATHWAY | 0.58 | 0.74 |
| STEIN_ESRRA_TARGETS_UP | 0.59 | 0.74 |
| YAGI_AML_WITH_T_8_21_TRANSLOCATION | 0.64 | 0.76 |
| NABA_MATRISOME | 0.65 | 0.76 |
| KEGG_TYPE_II_DIABETES_MELLITUS | 0.67 | 0.76 |
| SANSOM_APC_TARGETS_DN | 0.69 | 0.76 |
| REACTOME_INTEGRATION_OF_ENERGY_METABOLISM | 0.70 | 0.76 |
| REACTOME_REGULATION_OF_BETA_CELL_DEVELOPMENT | 0.70 | 0.76 |
| BORCZUK_MALIGNANT_MESOTHELIOMA_UP | 0.75 | 0.79 |
| SONG_TARGETS_OF_IE86_CMV_PROTEIN | 0.86 | 0.88 |
| CASORELLI_ACUTE_PROMYELOCYTIC_LEUKEMIA_DN | 0.88 | 0.88 |
These 43 pathways are nominated through the pathway meta-analysis on DIAGRAM and GERA studies. The analysis is carried out on the summary data from the AGEN-T2D study.
P-values based on summary statistics from the AGEN-T2D study;
FDR adjusted p-values;
Pathways that do not contain genes in the 17q21 region.