| Literature DB >> 27153677 |
Laura Buzdugan1, Markus Kalisch2, Arcadi Navarro3, Daniel Schunk4, Ernst Fehr5, Peter Bühlmann2.
Abstract
MOTIVATION: Although Genome Wide Association Studies (GWAS) genotype a very large number of single nucleotide polymorphisms (SNPs), the data are often analyzed one SNP at a time. The low predictive power of single SNPs, coupled with the high significance threshold needed to correct for multiple testing, greatly decreases the power of GWAS.Entities:
Mesh:
Year: 2016 PMID: 27153677 PMCID: PMC4920127 DOI: 10.1093/bioinformatics/btw128
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Schematic overview of the method. ‘Clustering’ refers to the step of hierarchically clustering the SNPs. SNPs on different chromosomes are clustered separately, after which the 22 clusters are joined into one final cluster containing all SNPs. ‘Multi-Sample Splitting and SNP Screening’ stands for the SNP selection in steps 1 and 2 of the method described in Section 2.4.2. These selected SNPs are used to compute the P-values. Finally, the last step of the method—‘Hierarchical Testing’—uses the selected SNPs to test groups of SNPs and eventually single SNPs. This testing is done hierarchically, on the cluster previously constructed. The output of the method consists of significant groups, or single SNPs, along with their P-values, that are adjusted for multiple testing
Fig. 2.The final cluster tree. The SNPs are first partitioned into chromosomes, and then a cluster tree is built for each chromosome separately using hierarchical clustering with average linkage. The hierarchical clusters of SNPs within chromosomes are not shown due to their size
Simulation results
| Design | Method | FWER | K | POW | POW |
|---|---|---|---|---|---|
| 1 | hierGWAS | 0.14 | 2 | 0.70 | 0.63 |
| 1 | PLINK | 1 | 44 | 0.89 | |
| 1 | GCTA | 1 | 44 | 0.89 | |
| 1 | FaST-LMM | 1 | 44 | 0.89 | |
| 2 | hierGWAS | 0.29 | 2 | 0.72 | 0.66 |
| 2 | PLINK | 1 | 81 | 0.87 | |
| 2 | GCTA | 1 | 93 | 0.89 | |
| 2 | FaST-LMM | 1 | 93 | 0.89 | |
| 3 | hierGWAS | 0.56 | 3 | 0.94 | 0.85 |
| 3 | PLINK | 1 | 130 | 0.94 | |
| 3 | GCTA | 1 | 131 | 0.94 | |
| 3 | FaST-LMM | 1 | 130 | 0.94 |
Comparison of four methods for three different scenarios.
FWER, Familywise error rate; k, value of k such that k-FWER ; POW, power; POW, adaptive power.
List of small significant groups of SNPs selected by our method for coronary artery disease, Crohn s disease, rheumatoid arthritis, type 1 diabetes and type 2 diabetes
| Dis | Significant group of SNPs | Chr | Gene | ||
|---|---|---|---|---|---|
| CAD | rs1333049 | 9 | intergenic | 0.013 | |
| CD | rs11805303, rs2201841, rs11209033, rs12141431, rs12119179 | 1 | IL23R | 0.014 | |
| CD | rs10210302 | 2 | ATG16L1 | 0.014 | |
| CD | rs6871834, rs4957295, rs11957215, rs10213846, rs4957297, rs4957300, rs9292777, rs10512734, rs16869934 | 5 | intergenic | 0.016 | |
| CD | rs10883371 | 10 | LINC01475, NKX2-3 | 0.004 | |
| CD | rs10761659 | 10 | ZNF365 | 0.007 | |
| CD | rs2076756 | 16 | NOD2 | 0.017 | |
| CD | rs2542151 | 18 | intergenic | 0.005 | |
| RA | rs6679677 | 1 | PHTF1 | 0.031 | |
| RA | rs9272346 | 6 | HLA-DQA1 | 0.017 | |
| T1D | rs6679677 | 1 | PHTF1 | 0.03 | |
| T1D | rs17388568 | 4 | ADAD1 | 0.006 | |
| T1D | rs9272346 | 6 | HLA-DQA1 | 0.17 | |
| T1D | rs9272723 | 6 | HLA-DQA1 | 0.17 | |
| T1D | rs2523691 | 6 | intergenic | 0.004 | |
| T1D | rs11171739 | 12 | intergenic | 0.01 | |
| T1D | rs17696736 | 12 | NAA25 | 0.018 | |
| T1D | rs12924729 | 16 | CLEC16A | 0.007 | |
| T2D | rs4074720, rs10787472, rs7077039, rs11196208, rs11196205, rs10885409, rs12243326, rs4132670, rs7901695, rs4506565 | 10 | TCF7L2 | 0.015 | |
| T2D | rs9926289, rs7193144, rs8050136, rs9939609 | 16 | FTO | 0.007 |
aThe disease identifier for which the SNP group was selected.
bThe smallest groups of SNPs whose null hypothesis was rejected. The SNPs in this group are jointly significant. rsIDs of SNPs from dbSNP.
cThe chromosome to which the SNPs in the group belong.
dThe gene to which the SNPs in the group belong, if any. Gene symbol from Entrez Gene.
eThe P-value of the group of SNPs, adjusted for multiple testing (controlling the FWER).
fThe variance explained by the group of SNPs.
List of large significant groups of SNPs selected by our method for bipolar disorder
| Size of significant SNP group | Chr | Hits | ||
|---|---|---|---|---|
| 6695 (22%) | 1 | 0.027 | 0.014 | 3 out of 10 |
| 12134 (40%) | 1 | 0.047 | 0.019 | 5 out of 10 |
| 14451 (45%) | 2 | 0.016 | 0.022 | 8 out of 18 |
| 7338 (23%) | 2 | 0.036 | 0.014 | 9 out of 18 |
| 1649 (6%) | 3 | 0.021 | 0.009 | 6 out of 15 |
| 24832 (100%) | 4 | 0.008 | 0.029 | 5 out of 5 |
| 14040 (55%) | 5 | 0.030 | 0.018 | 1 out of 5 |
| 24193 (100%) | 6 | 0.041 | 0.026 | 7 out of 7 |
| 20643 (100%) | 7 | 0.013 | 0.028 | 5 out of 5 |
| 21594 (100%) | 8 | 0.027 | 0.023 | 6 out of 6 |
| 11929 (65%) | 9 | 0.009 | 0.020 | 10 out of 12 |
| 22517 (100%) | 10 | 0.021 | 0.024 | 6 out of 6 |
| 15269 (77%) | 12 | 0.038 | 0.016 | 1 out of 2 |
| 4389 (36%) | 14 | 0.048 | 0.012 | 3 out of 11 |
| 11055 (100%) | 15 | 0.032 | 0.017 | 4 out of 4 |
| 10382 (88%) | 16 | 0.047 | 0.018 | 16 out of 16 |
aThe size of the SNP group is the number of SNPs that belong to the group. In parenthesis: size as percentage of total genotyped SNPs on the chromosome.
bThe chromosome to which the SNPs in the group belong.
cThe P-value of the group of SNPs, adjusted for multiple testing (controlling the FWER).
dThe variance explained by the group of SNPs.
eWe counted the number of SNPs with P-values identified using PLINK (Purcell ). We looked at how many of those SNPs are present in the groups selected by our method. The numbers refer to the SNPs on individual chromosomes.
Fig. 3.Variance in bipolar disorder that is explained by individual chromosomes. The variance on the vertical axis is given by the R2 value of all the selected SNPs in a chromosome, as described in the Supplementary Material Section S3. The total variance explained by all the selected SNPs on all the chromosomes is 0.5