| Literature DB >> 20529910 |
Xiang Zhang1, Shunping Huang, Fei Zou, Wei Wang.
Abstract
As a promising tool for identifying genetic markers underlying phenotypic differences, genome-wide association study (GWAS) has been extensively investigated in recent years. In GWAS, detecting epistasis (or gene-gene interaction) is preferable over single locus study since many diseases are known to be complex traits. A brute force search is infeasible for epistasis detection in the genome-wide scale because of the intensive computational burden. Existing epistasis detection algorithms are designed for dataset consisting of homozygous markers and small sample size. In human study, however, the genotype may be heterozygous, and number of individuals can be up to thousands. Thus, existing methods are not readily applicable to human datasets. In this article, we propose an efficient algorithm, TEAM, which significantly speeds up epistasis detection for human GWAS. Our algorithm is exhaustive, i.e. it does not ignore any epistatic interaction. Utilizing the minimum spanning tree structure, the algorithm incrementally updates the contingency tables for epistatic tests without scanning all individuals. Our algorithm has broader applicability and is more efficient than existing methods for large sample study. It supports any statistical test that is based on contingency tables, and enables both family-wise error rate and false discovery rate controlling. Extensive experiments show that our algorithm only needs to examine a small portion of the individuals to update the contingency tables, and it achieves at least an order of magnitude speed up over the brute force approach.Entities:
Mesh:
Year: 2010 PMID: 20529910 PMCID: PMC2881371 DOI: 10.1093/bioinformatics/btq186
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
An example dataset consisting of six SNPs {X1,…, X6}, the original phenotype Y0 and five phenotype permutations {Y1,…, Y5} for 24 individuals {S1,…, S24}
| 0 | 0 | 0 | 1 | 2 | 0 | 2 | 0 | 2 | 0 | 0 | 2 | 0 | 0 | 0 | 2 | 0 | 2 | 1 | 0 | 0 | 2 | 2 | 0 | |
| 2 | 2 | 0 | 2 | 0 | 2 | 0 | 2 | 2 | 2 | 2 | 0 | 1 | 0 | 0 | 2 | 0 | 2 | 1 | 0 | 2 | 2 | 2 | 2 | |
| 2 | 0 | 0 | 2 | 0 | 2 | 0 | 1 | 2 | 1 | 2 | 2 | 1 | 0 | 2 | 2 | 0 | 2 | 1 | 2 | 2 | 2 | 2 | 2 | |
| 0 | 2 | 2 | 0 | 0 | 0 | 2 | 1 | 0 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 2 | 0 | 0 | |
| 0 | 2 | 2 | 0 | 0 | 0 | 1 | 1 | 2 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 0 | 2 | 2 | 0 | 2 | |
| 0 | 2 | 2 | 0 | 0 | 0 | 2 | 1 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 2 | 0 | 2 | 1 | 0 | 2 | 2 | 0 | 0 | |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | |
| 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | |
| 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | |
| 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | |
| 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | |
Contingency table for two-locus test ℑ(XX, Y)
| Total | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Event | Event | Event | Event | Event | Event | Event | Event | Event | ||
| Event | Event | Event | Event | Event | Event | Event | Event | Event | ||
| Total | ||||||||||
Contingency tables for single-locus tests ℑ(X, Y) and ℑ(X, Y)
| Contingency table for ℑ( | Contingency table for ℑ( | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Total | Total | ||||||||
| Event | Event | Event | Event | Event | Event | ||||
| Event | Event | Event | Event | Event | Event | ||||
| Total | Total | ||||||||
Contingency table for genotype relation between two SNPs X and X
| Total | ||||
|---|---|---|---|---|
| Event | Event | Event | ||
| Event | Event | Event | ||
| Event | Event | Event | ||
| Total | ||||
Fig. 1.The minimum spanning tree built on the SNPs in the example dataset shown in Table 1.
Genotype difference between the connected SNPs in the minimum spanning tree shown in Figure 1
| 0→1 | 1→0 | 0→2 | 2→0 | 1→2 | 2→1 | |
|---|---|---|---|---|---|---|
| ( | ∅ | ∅ | { | { | { | ∅ |
| ( | { | { | { | { | ∅ | { |
| ( | ∅ | ∅ | { | { | { | ∅ |
| ( | { | { | { | { | ∅ | ∅ |
| ( | ∅ | ∅ | ∅ | { | { | { |
Entries of D(X3) with empty entries omitted for all permutations in a batch mode
| Individual id. | Phenotype permutations |
|---|---|
| { | |
| { | |
| { | |
| { |
Updating O(X3X5) from O(X3X2) for all permutations in a batch mode
| 1 | 1 | 1 | 2 | 1 | |
| 1 | 1 | 1 | 2 | 1 | |
| 1 | 2 | 2 | 2 | 1 | |
| 1 | 3 | 3 | 3 | 2 | |
| 0 | 2 | 3 | 2 | 1 |
Fig. 2.Comparison between TEAM and the brute force approach on human datasets under various experimental settings: varying the number of SNPs (a), individuals (b), permutations (c) and varying the case/control ratio (d).
The tree weight and the proportion of the individuals pruned by TEAM on the human datasets
| Settings | TEAM | Updating by Random Tree | Updating by Linear Tree | ||||
|---|---|---|---|---|---|---|---|
| Tree weight (%) | Pruning ratio (%) | Tree weight (%) | Pruning ratio (%) | Tree weight (%) | Pruning ratio (%) | ||
| No. of SNPs | 10 K | 17.721 | 94.104 | 53.326 | 88.722 | 53.158 | 89.210 |
| 20 K | 18.692 | 93.981 | 52.881 | 88.895 | 52.851 | 89.390 | |
| 30 K | 19.314 | 93.802 | 53.011 | 88.823 | 52.946 | 89.380 | |
| No. of Individuals | 200 | 16.641 | 94.376 | 53.358 | 88.749 | 53.179 | 89.205 |
| 300 | 17.342 | 94.209 | 53.343 | 88.730 | 53.142 | 89.213 | |
| 400 | 17.721 | 94.104 | 53.326 | 88.722 | 53.158 | 89.210 | |
| No. of Permutations | 100 | 17.721 | 94.104 | 53.326 | 88.722 | 53.158 | 89.210 |
| 300 | 17.721 | 94.105 | 53.326 | 88.724 | 53.158 | 89.212 | |
| 500 | 17.721 | 94.104 | 53.326 | 88.724 | 53.158 | 89.212 | |
| Case/control ratio | 100/300 | 17.721 | 97.049 | 53.326 | 94.355 | 53.158 | 94.599 |
| 200/200 | 17.721 | 94.104 | 53.326 | 88.722 | 53.158 | 89.210 | |
| 300/100 | 17.721 | 97.049 | 53.326 | 94.355 | 53.158 | 94.599 | |
Fig. 3.Comparison between TEAM, COE and the brute force approach on mouse datasets under various experimental settings: (a) varying the number of SNPs and (b) varying the number of individuals.
Identified significant SNP pairs in the simulated human GWAS datasets
| Dataset | Significant SNP-pair | Chromosome and location | FDR | FWER |
|---|---|---|---|---|
| 1 | (rs768529, rs3804940)* | (chr1: 51946762, chr3: 7520545) | 0.00067 | 0 |
| (rs768529, rs756084) | (chr1: 51946762, chr3: 7536149) | 0.00067 | 0 | |
| (rs768529, rs779742) | (chr1: 51946762, chr3: 7558058) | 0.00067 | 0 | |
| (rs768529, rs1872393) | (chr1: 51946762, chr3: 7546236) | 0.00067 | 0.004 | |
| (rs768529, rs779744) | (chr1: 51946762, chr3: 7555121) | 0.00067 | 0.004 | |
| (rs768529, rs6764561) | (chr1: 51946762, chr3: 7514592) | 0.00067 | 0.004 | |
| 2 | (rs10495728, rs521882)* | (chr2: 22811773, chr8: 16688797) | 0.004 | 0.004 |
| 3 | (rs1016836, rs2783130)* | (chr10: 31935845, chr13: 79068161) | 0 | 0 |
| 4 | (rs648519, rs1012273)* | (chr11: 98972936, chr16: 58525067) | 0.002 | 0.002 |