| Literature DB >> 26346579 |
Yun Joo Yoo1, Sun Ah Kim2, Shelley B Bull3.
Abstract
Gene-based analysis of multiple single nucleotide polymorphisms (SNPs) in a gene region is an alternative to single SNP analysis. The multi-bin linear combination test (MLC) proposed in previous studies utilizes the correlation among SNPs within a gene to construct a gene-based global test. SNPs are partitioned into clusters of highly correlated SNPs, and the MLC test statistic quadratically combines linear combination statistics constructed for each cluster. The test has degrees of freedom equal to the number of clusters and can be more powerful than a fully quadratic or fully linear test statistic. In this study, we develop a new SNP clustering algorithm designed to find cliques, which are complete subnetworks of SNPs with all pairwise correlations above a threshold. We evaluate the performance of the MLC test using the clique-based CLQ algorithm versus using the tag-SNP-based LDSelect algorithm. In our numerical power calculations we observed that the two clustering algorithms produce identical clusters about 40~60% of the time, yielding similar power on average. However, because the CLQ algorithm tends to produce smaller clusters with stronger positive correlation, the MLC test is less likely to be affected by the occurrence of opposing signs in the individual SNP effect coefficients.Entities:
Mesh:
Year: 2015 PMID: 26346579 PMCID: PMC4539439 DOI: 10.1155/2015/852341
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Quantitative trait models used for power comparisons of MLC-LD and MLC-CL.
| Model name | Description | Trait model parameters* |
|---|---|---|
| Model A | One causal SNP within a gene |
|
|
| ||
| Model B | Two causal SNPs, both deleterious |
|
|
| ||
| Model C | Two causal SNPs, one deleterious and one protective |
|
|
| ||
| Model D | 1~4 causal SNPs, random assignment of the direction of effects | | |
*The trait model is Y = ∑ b G + ε where ε ~ N(0, σ 2), C is the number of causal SNPs, b is the effect of ith causal SNP, and G is the number of causal alleles for the ith causal SNP. The variance σ2 is adjusted to make the power of Wald test 60% for each set of causal SNPs for Models A, B, and C and set to 1 for Model D.
Figure 1Clustering of gene ARHGAP29 by LDSelect and CLQ for a threshold value 0.7.
Mean and standard deviation over 1000 genes of agreement measure S and S′ for two clustering methods (LDSelect and CLQ) and number of genes with identical clustering.
| Allele frequency cut |
|
|
| Cases of perfect agreement | ||
|---|---|---|---|---|---|---|
| Mean | SD | Mean | SD | |||
| .05 | .3 | .676 | 0.203 | .325 | 0.342 | 180 |
| .4 | .769 | 0.191 | .510 | 0.336 | 283 | |
| .5 | .847 | 0.168 | .665 | 0.303 | 388 | |
| .6 | .909 | 0.123 | .781 | 0.242 | 483 | |
| .7 | .936 | 0.101 | .832 | 0.210 | 541 | |
| .8 | .959 | 0.086 | .884 | 0.178 | 648 | |
| .9 | .974 | 0.069 | .918 | 0.156 | 736 | |
|
| ||||||
| .01 | .3 | .689 | 0.196 | .395 | 0.376 | 155 |
| .4 | .789 | 0.177 | .559 | 0.379 | 254 | |
| .5 | .863 | 0.151 | .687 | 0.338 | 361 | |
| .6 | .923 | 0.105 | .794 | 0.267 | 468 | |
| .7 | .948 | 0.084 | .843 | 0.230 | 536 | |
| .8 | .968 | 0.068 | .892 | 0.201 | 644 | |
| .9 | .981 | 0.053 | .922 | 0.172 | 744 | |
The average over 1000 genes of the number of clusters per gene, the mean size of the clusters within a gene, and the standard deviation of the cluster sizes within a gene for two clustering methods (LDSelect and CLQ).
| Allele frequency cut |
| # of clusters* | Mean size of clusters* | SD size of clusters* | |||
|---|---|---|---|---|---|---|---|
| LDSelect | CLQ | LDSelect | CLQ | LDSelect | CLQ | ||
| .05 | .3 | 1.84 | 2.94 | 6.39 | 3.70 |
|
|
| .4 | 2.43 | 3.27 | 4.82 | 3.40 | 2.82 | 2.33 | |
| .5 | 3.02 | 3.67 | 3.85 | 3.06 | 2.49 | 2.02 | |
| .6 | 3.65 | 4.19 | 3.13 | 2.67 | 2.13 | 1.75 | |
| .7 | 4.36 | 4.79 | 2.61 | 2.32 | 1.75 | 1.46 | |
| .8 | 5.18 | 5.52 | 2.18 | 2.01 | 1.38 | 1.19 | |
| .9 | 6.29 | 6.57 | 1.76 | 1.65 | 1.00 | 0.89 | |
|
| |||||||
| .01 | .3 | 2.40 | 3.53 | 5.25 | 3.28 | 3.33 | 2.53 |
| .4 | 3.02 | 3.91 | 4.08 | 3.00 | 3.02 | 2.25 | |
| .5 | 3.66 | 4.36 | 3.32 | 2.71 | 2.52 | 1.96 | |
| .6 | 4.35 | 4.90 | 2.75 | 2.40 | 2.07 | 1.67 | |
| .7 | 5.10 | 5.57 | 2.34 | 2.10 | 1.66 | 1.40 | |
| .8 | 5.99 | 6.32 | 1.98 | 1.85 | 1.31 | 1.15 | |
| .9 | 7.17 | 7.42 | 1.64 | 1.56 | 0.94 | 0.85 | |
*The differences of the obtained characteristics within genes are compared by paired t-test and all results were significant with P values <1e −10 except the italic pairs (P = 0.61).
Figure 2Averages of (a) the ratio of number of clusters to number of SNPs, (b) the size of the largest cluster, and (c) the number of singleton clusters in each of 1000 genes for LDSelect and CLQ clustering given a threshold value c.
Figure 3Averages of (a) the size of the largest cluster and (b) the number of singleton clusters produced in each gene by LDSelect and CLQ for a fixed number of clusters per gene. For each gene, the number of clusters produced by each clustering method was found at threshold values within a grid from 0.1 to 0.9 by 0.01. Excluding results at the extremes (i.e., including cluster numbers that fell between 20% and 90% of the number of SNPs), the LDSelect and CLQ cluster numbers were matched for each gene and the maximum cluster size for each was averaged across genes at a fixed value of the number of clusters.
Average MLC test power over all gene-causal-SNP combinations for LDSelect (MLC-LD) and CLQ (MLC-CL) clustering methods and the proportion of genes where MLC-LD power and MLC-CL power are less than Wald test power.
| Model |
| All possible causal SNPs and all genes | All possible causal SNPs for the genes where LDSelect and CLQ clusters are different | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
| Average†,∗ | % Power < Wald* |
| Average†,∗ | % Power < Wald* | ||||||
| LDS | CLQ | LDS | CLQ | LDS | CLQ | LDS | CLQ | ||||
| A | 0.3 | 11,117 | 0.627 | 0.757 | 36.6 | 6.2 | 9,765 | 0.614 | 0.759 | 40.0 | 5.8 |
| 0.4 | 11,117 | 0.670 |
| 26.4 | 3.9 | 8,867 | 0.656 |
| 30.5 | 3.3 | |
| 0.5 | 11,117 | 0.716 | 0.754 | 14.6 | 2.2 | 8,069 | 0.714 | 0.759 | 17.2 | 1.8 | |
| 0.6 | 11,117 |
| 0.745 | 6.7 | 1.0 | 7,381 | 0.742 | 0.753 | 8.1 | 0.7 | |
| 0.7 | 11,117 | 0.733 | 0.730 | 2.7 | 0.6 | 6,234 |
| 0.744 | 3.4 | 0.2 | |
| 0.8 | 11,117 | 0.719 | 0.712 | 1.1 | 0.6 | 5,138 | 0.746 | 0.731 | 0.8 | 0.0 | |
| 0.9 | 11,117 | 0.691 | 0.685 | 1.4 | 1.3 | 3,512 | 0.726 | 0.707 | 0.3 | 0.0 | |
|
| |||||||||||
| B | 0.3 | 79,650 | 0.645 | 0.771 | 33.7 | 5.6 | 74,715 | 0.640 | 0.774 | 35.0 | 5.2 |
| 0.4 | 79,650 | 0.682 |
| 25.5 | 3.6 | 70,384 | 0.674 |
| 27.3 | 3.0 | |
| 0.5 | 79,650 | 0.727 | 0.769 | 14.5 | 2.1 | 66,788 | 0.723 | 0.770 | 15.8 | 1.7 | |
| 0.6 | 79,650 |
| 0.760 | 6.4 | 1.2 | 63,848 | 0.752 | 0.764 | 7.0 | 0.9 | |
| 0.7 | 79,650 | 0.748 | 0.745 | 3.0 | 0.6 | 57,300 |
| 0.752 | 3.5 | 0.5 | |
| 0.8 | 79,650 | 0.733 | 0.724 | 0.9 | 0.4 | 48,577 | 0.752 | 0.737 | 0.7 | 0.2 | |
| 0.9 | 79,650 | 0.701 | 0.692 | 0.9 | 0.5 | 33,403 | 0.724 | 0.706 | 0.8 | 0.1 | |
|
| |||||||||||
| C | 0.3 | 79,650 | 0.499 | 0.649 | 54.3 | 23.7 | 74,710 | 0.505 | 0.663 | 54.2 | 21.9 |
| 0.4 | 79,650 | 0.551 | 0.657 | 44.1 | 21.1 | 70,409 | 0.557 | 0.675 | 44.0 | 18.6 | |
| 0.5 | 79,650 | 0.603 | 0.662 | 32.8 | 18.4 | 66,772 | 0.615 |
| 32.0 | 15.5 | |
| 0.6 | 79,650 | 0.637 |
| 23.7 | 16.4 | 63,910 | 0.651 | 0.682 | 22.8 | 14.1 | |
| 0.7 | 79,650 | 0.652 | 0.662 | 18.1 | 14.1 | 57,409 | 0.669 | 0.682 | 17.4 | 12.2 | |
| 0.8 | 79,650 |
| 0.657 | 14.1 | 11.7 | 48,669 |
| 0.678 | 13.8 | 10.3 | |
| 0.9 | 79,650 | 0.645 | 0.646 | 10.3 | 8.8 | 33,625 | 0.661 | 0.662 | 11.6 | 8.3 | |
|
| |||||||||||
| D** | 0.3 | 8,883 | 0.388 | 0.444 | 36.5 | 12.1 | 7,054 | 0.372 |
| 39.7 | 9.9 |
| 0.4 | 8,883 | 0.408 | 0.447 | 28.3 | 9.7 | 6,140 | 0.389 | 0.440 | 32.5 | 7.3 | |
| 0.5 | 8,883 | 0.426 |
| 18.8 | 7.4 | 5,119 | 0.404 | 0.433 | 22.1 | 5.0 | |
| 0.6 | 8,883 |
| 0.445 | 10.8 | 5.5 | 4,420 |
| 0.435 | 12.5 | 3.5 | |
| 0.7 | 8,883 |
|
| 6.5 | 4.2 | 3,625 |
|
| 7.4 | 3.0 | |
| 0.8 | 8,883 | 0.435 | 0.433 | 4.4 | 3.3 | 2,827 | 0.419 | 0.412 | 4.1 | 1.7 | |
| 0.9 | 8,883 | 0.425 | 0.423 | 3.6 | 3.3 | 2,103 | 0.406 | 0.396 | 2.7 | 1.7 | |
*The differences of power between two clustering algorithm and the proportions of cases with MLC test power less than the power of Wald test within genes are compared by paired t-test and McNemar test, respectively, and all results are significant with P values <0.05 except the italic pairs.
**The power of Wald test for Models A, B, and C were fixed as 0.6, whereas the average power of Wald test for Model D was 0.388 in average over all genes (left) and 0.377, 0.373, 0.365, 0.368, 0.354, 0.356, and 0.351 for c = 0.3 ~ 0.9, respectively, for genes with clustering results are different (right).
†Bolded numbers are the maximum average power of MLC over different threshold values within each clustering method, trait model, and the set of genes (all or the ones with different clustering results by LDSelect and CLQ).
Figure 4Power of MLC test based on LDSelect clustering (MLC-LD: blue) and CLQ clustering (MLC-CL: red) with threshold c = 0.7 for the cases where LDSelect and CLQ clusters are different. Each point represents one case of all possible causal SNP assignments within a gene. The variance of the error term was adjusted for each of 1000 genes such that the Wald test power is exactly 0.6 (Grey) for Models A, B, and C and fixed as 1 for Model D.