| Literature DB >> 25259584 |
Hai-Ming Xu1, Xi-Wei Sun2, Ting Qi2, Wan-Yu Lin3, Nianjun Liu4, Xiang-Yang Lou4.
Abstract
The elusive but ubiquitous multifactor interactions represent a stumbling block that urgently needs to be removed in searching for determinants involved in human complex diseases. The dimensionality reduction approaches are a promising tool for this task. Many complex diseases exhibit composite syndromes required to be measured in a cluster of clinical traits with varying correlations and/or are inherently longitudinal in nature (changing over time and measured dynamically at multiple time points). A multivariate approach for detecting interactions is thus greatly needed on the purposes of handling a multifaceted phenotype and longitudinal data, as well as improving statistical power for multiple significance testing via a two-stage testing procedure that involves a multivariate analysis for grouped phenotypes followed by univariate analysis for the phenotypes in the significant group(s). In this article, we propose a multivariate extension of generalized multifactor dimensionality reduction (GMDR) based on multivariate generalized linear, multivariate quasi-likelihood and generalized estimating equations models. Simulations and real data analysis for the cohort from the Study of Addiction: Genetics and Environment are performed to investigate the properties and performance of the proposed method, as compared with the univariate method. The results suggest that the proposed multivariate GMDR substantially boosts statistical power.Entities:
Mesh:
Year: 2014 PMID: 25259584 PMCID: PMC4178067 DOI: 10.1371/journal.pone.0108103
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Type I error rates for GEE-GMDR and GMDR methods.
| Method | GEE-GMDR | GMDR-T1 | GMDR-T2 | ||||||
| Significance level | Digenic | Trigenic | Tetragenic | Digenic | Trigenic | Tetragenic | Digenic | Trigenic | Tetragenic |
| 0.05 | 0.040 | 0.054 | 0.048 | 0.052 | 0.053 | 0.050 | 0.050 | 0.056 | 0.047 |
| 0.01 | 0.006 | 0.008 | 0.011 | 0.014 | 0.010 | 0.010 | 0.006 | 0.014 | 0.007 |
GEE-GMDR is the GEE-GMDR analysis for the simulated bivariate traits, GMDR-T1 is the univariate GMDR analysis for trait 1, and GMDR-T2 is the univariate GMDR analysis for trait 2.
Figure 1Quantile-quantile plot of significance level and Type I error rate.
The Type I error is evaluated by GEE-GMDR method with digenic, trigenic and tetragenic models in presence of no gene-gene interaction and no residual correlation. The reference line is a diagonal line with unit slope through the origin. An unbiased method is expected to give the points falling on or near the reference line (i.e., Type I error rate is very close to the nominal level).
Comparison of Cross-Validation Consistency (CVC) and Test Accuracy (TA) between GEE-GMDR and the GMDR method for two simulated continuous traits.
| GEE-GMDR | GMDR-T1 | GMDR-T2 | ||||
| Model and No. ofloci | CVC | TA | CVC | TA | CVC | TA |
| Digenic | ||||||
| 1 | 9.010±1.446 | .573±.025 | 8.445±1.886 | .546±.035 | 8.500±1.616 | .546±.027 |
|
|
|
|
|
|
|
|
| 3 | 6.680±2.056 | .647±.024 | 6.790±2.147 | .597±.032 | 6.460±2.025 | .594±.029 |
| 4 | 5.550±2.126 | .624±.029 | 5.485±2.141 | .575±.037 | 5.435±2.071 | .576±.031 |
| 5 | 5.230±2.131 | .603±.033 | 5.105±1.911 | .557±.035 | 4.820±2.066 | .554±.032 |
| 6 | 5.050±2.017 | .583±.033 | 4.695±1.844 | .543±.033 | 4.765±1.915 | .542±.034 |
| 7 | 4.970±1.977 | .562±.035 | 4.825±2.165 | .530±.037 | 5.195±1.999 | .535±.034 |
| 8 | 5.340±2.036 | .550±.040 | 5.465±2.140 | .528±.044 | 5.250±2.034 | .526±.038 |
| 9 | 6.705±2.110 | .545±.050 | 6.500±2.088 | .524±.050 | 6.720±2.178 | .533±.052 |
| Trigenic | ||||||
| 1 | 8.625±1.700 | .551±.029 | 8.095±2.017 | .533±.034 | 7.720±2.020 | .527±.031 |
| 2 | 7.805±2.017 | .568±.031 | 6.655±2.142 | .536±.034 | 6.535±2.117 | .530±.034 |
|
|
|
|
|
|
|
|
| 4 | 6.765±2.069 | .607±.027 | 6.320±2.250 | .559±.037 | 6.090±2.110 | .556±.033 |
| 5 | 5.685±1.996 | .582±.030 | 5.210±1.989 | .541±.035 | 5.020±2.096 | .540±.034 |
| 6 | 5.375±2.075 | .565±.033 | 4.720±2.008 | .529±.035 | 4.805±1.922 | .527±.033 |
| 7 | 5.090±2.048 | .544±.035 | 4.585±2.016 | .520±.035 | 4.940±2.041 | .522±.034 |
| 8 | 5.395±2.020 | .534±.041 | 5.460±2.083 | .521±.041 | 5.335±2.169 | .519±.041 |
| 9 | 6.770±2.066 | .535±.052 | 6.545±2.147 | .518±.052 | 6.740±2.247 | .526±.050 |
| Tetragenic | ||||||
| 1 | 8.060±1.764 | .537±.029 | 7.730±2.076 | .524±.036 | 7.465±2.025 | .537±.029 |
| 2 | 7.000±2.271 | .546±.034 | 6.415±2.284 | .526±.036 | 5.870±2.251 | .518±.035 |
| 3 | 6.330±2.291 | .548±.037 | 5.285±2.302 | .522±.035 | 5.245±2.135 | .519±.033 |
|
|
|
|
|
|
|
|
| 5 | 6.105±2.125 | .564±.037 | 4.865±2.126 | .524±.036 | 4.980±1.985 | .521±.030 |
| 6 | 5.190±2.068 | .543±.038 | 4.675±1.995 | .519±.033 | 4.605±1.967 | .517±.035 |
| 7 | 4.860±1.881 | .530±.036 | 4.830±1.881 | .516±.032 | 4.840±2.087 | .515±.035 |
| 8 | 5.280±2.101 | .525±.042 | 5.350±2.109 | .514±.041 | 5.195±2.034 | .511±.038 |
| 9 | 6.845±2.030 | .527±.050 | 6.470±2.126 | .515±.051 | 6.725±2.203 | .521±.050 |
The genotypes with two uppercase-letter alleles (i.e., AAbb, AaBb, aaBB) are set as high-risk group and the rest as the low-risk group.
The genotypes with three uppercase-letter alleles are set as high-risk group and the rest as the low-risk group.
The genotypes with four uppercase-letter alleles are set as high-risk group and the rest as the low-risk group.
GEE-GMDR is the GEE-GMDR analysis for the simulated bivariate traits.
GMDR-T1 is the unvariate GMDR analysis for trait 1.
GMDR-T2 is the univariate GMDR analysis for trait 2.
Figure 2Comparison of statistical power between univarate GMDR method and GEE-GMDR under digenic, trigenic and tetragenic interaction models.
The horizontal axis represents different residual correlations. The empirical statistical power is defined as the proportion of significant true models at 5% level in 200 simulations.
Figure 3The principal components analysis for SAGE.
The first two principal components are plotted to represent genetic background of the SAGE.
Interaction SNPs detected among CHRNB2, NTRK2, BDNF, and CHRNA4.
| Model | TA |
|
|
| rs2072660-rs1209068-rs11030134-rs6011770 | |||
| GEE-GMDR | .5780 | <10e-04 | 2.62e-04 |
| GMDR FTND
| .5411 | 6.30e-02 | 9.87e-02 |
| GMDR ND
| .5283 | 2.41e-02 | 2.40e-01 |
| GMDR MC
| .5128 | 5.40e-01 | 7.86e-01 |
In the model, from left to right, the SNPs are located in CHRNB2, NTRK2, BDNF, CHRNA4, respectively.
TA denotes test accuracy.
p sign values were from the sign test after Bonferroni correction.
p perm values were from the permutation test after Bonferroni correction.
FTND denotes the Fagerstrom Test for Nicotine Dependence.
ND denotes the DSM4 Nicotine Dependence.
MC denotes the Maximum number of Cigarette.
Information on the SNPs in the best model identified using GEE-GMDR method.
| SNP | Gene | Chromosome | Domain | Physical Position | Alleles | Reported MAF |
| rs2072660 |
| 1 | 3UTR | 152815345 | C/ | .319 |
| rs1209068 |
| 9 | Intron | 86530338 | C/ | .096 |
| rs11030134 |
| 11 | 5′ Flanking | 27743050 |
| .282 |
| rs6011770 |
| 20 | Intron | 61447875 | C/ | .054 |
The nucleotide of each SNP shown in bold represents the minor allele as given in dbSNP (build 138).
The minor allele frequency (MAF) presented in dbSNP (build 138).
Figure 4The interaction pattern among rs2072660-rs1209068-rs11030134-rs6011770.
The left bar in each nonempty cell denotes a positive score and the right bar a negative score. High-risk cells are indicated by dark shading, low-risk cells by light shading, and empty cells by no shading. Note that the patterns of high-risk and low-risk cells differ across each of the different multilocus dimensions, presenting evidence of epistasis.