Literature DB >> 28479869

Detecting Gene-Gene Interactions Associated with Multiple Complex Traits with U-Statistics.

Ming Li¹, Changshuai Wei¹, Yalu Wen¹, Tong Wang¹, Qing Lu¹.

Abstract

Many complex diseases, such as psychiatric and behavioral disorders, are commonly characterized through various measurements that reflect physical, behavioral and psychological aspects of diseases. While it remains a great challenge to find a unified measurement to characterize a disease, the available multiple phenotypes can be analyzed jointly in the genetic association study. Simultaneously testing these phenotypes has many advantages, including considering different aspects of the disease in the analysis, and utilizing correlated phenotypes to improve the power of detecting disease-associated variants. Furthermore, complex diseases are likely caused by the interplay of multiple genetic variants through complicated mechanisms. Considering gene-gene interactions in the joint association analysis of complex diseases could further increase our ability to discover genetic variants involving complex disease pathways. In this article, we propose a stepwise U-test for joint association analysis of multiple loci and multiple phenotypes. Through simulations, we demonstrated that testing multiple phenotypes simultaneously could attain higher power than testing one single phenotype at a time, especially when there are shared genes contributing to multiple phenotypes. We also illustrated the proposed method with an application to Nicotine Dependence (ND), using datasets from the Study of Addition, Genetics and Environment (SAGE). The joint analysis of three ND phenotypes identified two SNPs, rs10508649 and rs2491397, and reached a nominal P-value of 3.79e-13. The association was further replicated in two independent datasets with P-values of 2.37e-05 and 7.46e-05.

Entities: Chemical Disease Gene Mutation Species

Keywords: Nicotine dependence; Pleiotropy; Population-based association studies

Year: 2016 PMID： 28479869 PMCID： PMC5320542 DOI： 10.2174/1389202917666160513100946

Source DB: PubMed Journal: Curr Genomics ISSN： 1389-2029 Impact factor: 2.236

INTRODUCTION

Genome-wide association studies (GWASs) have been commonly adopted for investigating the genetic basis of complex human diseases, successfully identifying thousands of single nucleotide polymorphisms (SNPs) associated with complex diseases [1, 2]. However, for many complex diseases, the findings to date only explain a small percentage of heritability [3-5]. The genetic etiology of complex human diseases has remained largely unknown, and detecting genetic variants that account for the “missing heritability” has continued to be a major goal and challenge for the coming decade. The GWASs have commonly used a single-locus approach to test the association between a single SNP and a disease outcome of interest. Such a single-locus and single-phenotype strategy could have limitations on fully utilizing information from the genotype level and the phenotype level. First, complex diseases are usually caused by multiple genetic variants, each conferring a small to moderate effect. The single-locus tests could be under-powered due to the low effect sizes of causal variants and the burden of multiple testing. In addition, genetic variants may interact with one another through complicated mechanisms, and thus, may be overlooked if they are tested separately without considering possible interaction effects. Second, a complex disease may manifest with a wide variety of features, such as multiple measurements of a disease, intermediate phenotypes, sub-phenotypes, and endophenotypes. These phenotypes may better characterize the underlying disease etiology, and hence, provide more information than a single disease outcome [6]. In genetics, it is also a common phenomenon that shared genetic variants may simultaneously influence multiple phenotypes (i.e., pleiotropy) [7]. The successful identification of shared genetic variants contributing to seemingly distinct phenotypes will help elucidate the common genetic cause of these phenotypes, and will promote the development of a more efficient strategy to treat or prevent these diseases. There is also a growing interest in analyzing multiple related phenotypes in GWAS [6, 8, 9]. For instance, Phenome-Wide Association Studies (PheWAS) are interested in analyzing multiple phenotypes instead of single phenotype. These studies commonly adopt a conventional single-SNP/single-phenotype approach in the analysis. Although convenient, the single-SNP/single-phenotype approach may significantly increase the number of statistical tests, leading to reduced power. To account for the issue of multiple testing due to the increased number of phenotypes, Lange et al. proposed to use principal components of phenotypes (PCP) for dimensionality reduction [10]. However, such a strategy is less straight forward for interpretation, because the outcome becomes a linear combination of phenotypes. More importantly, a PCP captures key phenotype information but not necessarily phenotype information related to genetic information. To address this limitation, Klei et al. extended the PCP method with a principle component of heritability (PCH) method [11]. However, this PCH method required estimating a PCH for each single SNP, which is computationally intensive for high-dimensional data. A number of other methods were also proposed by using the generalized estimation equations (GEE) and the generalized Kendall’s Tau test [12, 13]. It has been shown that these multi-phenotype tests have improved performance over a single-phenotype test. However, these available methods are mainly developed to test each single SNP at a time. Statistical methods that consider the joint effect of multiple variants with multiple phenotypes are still under-developed. During the past decade, multi-locus tests considering gene-gene interactions have been increasingly used in genetic association analyses [14-20]. Non-parametric methods, such as U-statistic-based methods, have shown great promise for high-dimensional data analysis, especially when the underlying phenotype distributions and modes of inheritance are unknown. Various formations of U-Statistics have been adopted for multi-locus association tests [21-24]. For example, Schaid et al. proposed a U-statistic-based score test that summarized a set of SNPs, and then examined their joint association with a phenotype [21]. Wei et al. extended this method by using data-adaptive weights for different genetic variants [22]. We and others have further considered possible interactions among genetic variants, and proposed a forward-U test and a likelihood ratio Mann-Whitey test for quantitative phenotypes and binary phenotypes, respectively [23, 24]. These multi-locus methods have emerged as promising tools in the joint association analysis of a single disease phenotype. It is also of great interest to extend those methods for the analyses of multiple phenotypes. In this article, we propose a U-Statistic-based method, a stepwise U-test, for testing the joint association between multiple genetic variants and multiple phenotypes. It can be viewed as an extension of a recently developed forward U-test for single-phenotype analyses [23]. The proposed method has the following properties: 1) it searches forwardly for SNPs that are associated with one or more phenotypes; 2) it filters backwardly to remove phenotypes that are not relevant to genetic variants; and 3) it tests the joint effect among SNPs while allowing for possible interactions. Through simulations, we have shown the proposed method had improved performance over a single-phenotype test. We also illustrated the proposed method with an application to Nicotine Dependence (ND).

METHODS

Suppose we have a study population of N subjects. Each subject has T measured phenotypes, and is genotyped with K SNPs. Let and be the phenotypes and SNP genotypes for subject i. Here, we assume that all phenotypes are quantitative and may have unknown distributions. We further assume 1) a subset of phenotypes is associated with part of K SNPs; 2) a subset of SNPs influences part of T measured phenotypes with possible interactions.

U-Statistic

We have recently proposed a U-Statistic-based method, referred to as forward U-test, to test the joint association analysis between multiple loci and a single phenotype [23]. In this article, we extend forward U-test for testing the joint association between multiple loci and multiple phenotypes. Following the similar notations, we assume k disease-associated SNPs comprising L multi-locus genotypes, denoted by G1, G2, . . ., G. The selection process of k disease-associated SNPs is detailed below (Section 2.2). Here, a multi-locus genotype, G, is defined as a vector of k single-locus genotypes that an individual carries. We denote by the group of subjects carrying a multi-locus genotype, and the number of subjects in S. For each single phenotype we first choose a kernel function as , and then define a general L-group U-Statistic, where is a two-group U-statistic defined for groups S and S, and is a weight parameter to account for the number of subjects in various genotype groups. Given the U-Statistic of each phenotype defined in Eq. (1), a multivariate U-Statistic for T phenotypes, , can be formed as . Under the null hypothesis of no association, it follows asymptotically a multivariate normal distribution, . The test statistic to evaluate the joint association of k SNPs and T phenotypes is thus defined as: which follows a Chi-square distribution with T degrees of freedom, X(T). In practice, the sample covariance matrix is used in Eq. (3), which is detailed in the Appendix.

Forward Section of SNPs

While dealing with a large number of SNPs, it is likely that a significant proportion of SNPs are not disease-related. In this article, we follow the same strategy used in the forward U-test to select k disease-associated SNPs from total K genotyped SNPs, and use them to build the above test statistic . This selection process starts with a single SNP. In the first step, each SNP j can partition the subjects into two genotype groups in three possible ways: ; ; and . We scanned each SNP, and select a single SNP and a partition with the maximum test statistic. We denote the corresponding partitioning strategy in the first step as . In the second step, a second SNP is selected and further partition all subjects into four groups, as . Again, the SNP and partitioning strategy with the largest test statistic is selected. It should be noted that under the null hypothesis of no association, the four groups of subjects (i.e. ) have the same phenotypic values. Upon the rejection of the null hypothesis, the phenotypes are not all equal among four groups of subjects, without assuming an additive effect from SNPs. Therefore, the proposed method tests for association while allowing for statistical interactions, [25] and SNPs with non-linear interaction effects can be detected. By repeating the selection process, SNPs is selected forwardly to partition subjects into multi-locus genotype groups. To avoid the issue of over-fitting, a 10-fold cross-validation procedure is adopted to determine the most parsimonious model. Assuming the forward selection is stopped at step s, we then have the final model with s SNPs, which comprise L multi-locus genotype groups, .

Backward Section of Phenotypes

When dealing with multiple phenotypes, it is also likely that a subset of phenotypes has no genetic relevance. Because the number of phenotypes is generally small, we propose to use a backward selection strategy to filter out phenotypes that are not genetically related. The selection process starts with all T available phenotypes. In the first step, multi-locus genotype groups can be formed by using the forward selection process described in Section 2.2, with a corresponding test statistic . In the second step, by removing one phenotype, , at a time, T possible phenotype subsets can be formed, each with T-1 phenotypes. For each subset of phenotypes, multi-locus genotype groups can be formed by using the forward selection, with a corresponding test statistic . The smallest test statistic obtained from T possible phenotype subsets, , will then be compared to that of T phenotypes, by their corresponding p-values. The genotype-phenotype association can be assessed by and for T phenotypes and T-1 phenotypes, respectively. We remove a phenotype, , if leads to a more significant association than , i.e. . The backward selection of phenotypes and forward selection of SNPs are conducted iteratively until no phenotypes can be removed to improve the significance of the association.

Test of Significance

Because the proposed method conducts model selection by maximizing the test statistic, the asymptotic test is no longer valid [26-28]. To examine the overall significance of the association, a permutation test is then conducted by randomly shuffling the phenotypes and then applying the forward selection of SNPs and backward selection of phenotypes as described above. Based on the permutation distribution of the test statistic , an empirical P-value, which takes model selection into account, can be attained. In a replication study when the multi-locus genotype combinations and the subset of phenotypes are pre-determined from an initial study, the overall significance of the association can be obtained from a Chi-square distribution.

RESULTS

Simulation Studies

Simulation Settings

We conducted simulation studies to evaluate the proposed method, and compared it to the forward U-test, which analyzes one phenotype at a time. In each replicate, we simulated 1,000 subjects, each genotyped with 10 SNPs. The genotypes were simulated by assuming a minor allele frequency of 0.3 and Hardy Weinberg Equilibrium (HWE). Each simulation scenario was repeated for 1,000 times to evaluate the type I error rates and statistical power of two methods. For simplicity, we assume an additive model for kth SNP (i.e., for AA, for Aa, and for aa). We first evaluated the type I error rate of the proposed method by simulating the phenotypes independently from the genotypes, assuming each phenotype follows a standard normal distribution. The type I error rates were evaluated for a varying number of phenotypes (i.e. from 1 to 5). To evaluate statistical power, phenotypes were simulated according to various disease scenarios described below.

Simulation I: Varying Number of Shared SNPs

In the first simulation, we considered two phenotypes, each influenced by 4 SNPs with an additive effect. We evaluated the performance of the proposed method by varying the number of shared SNPs that were associated with both phenotypes (i.e. from 0 to 4). The two phenotypes were thus simulated by: ; where the first q SNPs were shared SNPs that were associated with both phenotypes; and the remaining (4-q) SNPs were unique SNPs that were associated with one of the phenotypes. In such a disease scenario, we expected that the shared genetic components of two phenotypes would increase as the number of shared SNPs increased.

Simulation II: Varying Effect Size

In the second simulation, we also considered two phenotypes, each influenced by 2 SNPs with an additive effect. We further assumed two phenotypes shared one causal SNP. The phenotypes were thus simulated by: where x was the shared SNP that influenced both phenotypes, and x and x are unique SNPs that only influenced one of the phenotypes. We evaluated the performance of the proposed method by varying the relative contribution between the shared SNP and the unique SNPs (i.e. ).

Simulation III: Varying Patterns of Interaction Effects

In the third simulation, we considered two phenotypes, influenced by 2 shared SNPs, but through various modes of inheritance that may or may not involve interactions. Each phenotype was simulated from three possible disease models, including an additive effect model, a multiplicative effect model, and a threshold effect model. The first model did not have an interaction effect, while the other two models had an interaction effect. The details of the simulation were described below: a. Both phenotypes were simulated through an additive effect model, b. One phenotype was simulated through an additive effect model, while the other phenotype was simulated through a multiplicative effect model, which assumed an interaction effect on a multiplicative scale. c. One phenotype was simulated through an additive effect model, while the other phenotype was simulated through a threshold effect model, which assumed an interaction effect in the presence of minor alleles at both SNPs, d. One phenotype was simulated through a multiplicative effect model, while the other phenotype was simulated through a threshold effect model,

Simulation IV: Varying Number of Phenotypic Traits

In the fourth simulation, we considered a varying number of phenotypes (i.e. from 3 to 5). We further assumed only 2 phenotypes were genetically related, so that the number of noise phenotypes varied from 1 to 3. The first two phenotypes were simulated through the disease models discussed in Simulation III, while the remaining phenotypes were simulated independently from the genotypes, assuming a standard normal distribution.

Simulation Results

Type I Error

The results of type I error for the stepwise U-test are summarized in Table . The results have shown that type I error of the new method remained well controlled at the level of 0.05 for different numbers of phenotypes. To evaluate the statistical power, we conducted 1,000 permutation replicates for each simulation scenario. The power was defined as the probability of the observed test statistic exceeding the 95 percentile of the empirical permutation distribution. We also used sensitivity and specificity to measure the accuracy of SNP selection. In particular, Sensitivity A was defined as the probability to select a causal SNP that influenced only one of the phenotypes; Sensitivity B was defined as the probability to select a causal SNP that influenced both phenotypes; and Specificity was defined as 1 - the probability to select a SNP that influenced none of the phenotypes. The definition of these measurements remained same for all simulation scenarios. The results of Simulation I are summarized in Table . The results showed that the power of single-phenotype analysis remained stable around 0.50 (i.e. between 0.481 and 0.530). When two phenotypes shared no causal SNPs (i.e. q=0), the power of the multi-phenotype analysis (i.e. 0.544) was comparable to that of the single-phenotype analysis. However, the power of multi-phenotype analysis increased as the number of shared SNPs increased. When all causal SNPs were shared SNPs (i.e. q=4), the power of the multi-phenotype analysis (i.e. 0.903) was substantially higher than that of single-phenotype analysis. In terms of SNP selection, multi-phenotype showed an improved ability to select shared SNPs than single-phenotype analysis (i.e. sensitivity B), but reduced probability to select unique SNPs (i.e. sensitivity A). In terms of specificity, single-phenotype analysis and multi-phenotype analysis had comparable performance (i.e. around 95%). The results of Simulation II are summarized in Table . When the effect sizes of causal SNPs increased, the statistical power of both multi-phenotype analysis and single-phenotype analysis increased. Furthermore, when the effect sizes of shared SNPs or unique SNPs increased, the power of single-phenotype analysis increased on a similar level. Nevertheless, the power of multi-phenotype analysis increased substantially when the effect size of shared SNP increased. In terms of SNP selection, SNPs with larger effect sizes were more likely to be selected from either single-phenotype or multi-phenotype analysis. Multi-phenotype analysis may increase the probability to select a shared SNP (i.e. sensitivity A), but reduce the probability to select a unique SNP (i.e. sensitivity B). The specificity remained at a high level for both single-phenotype and multi-phenotype analyses (i.e. over 90%).

Simulation III: Varying Underlying Disease Models

The simulation results are summarized in Table . The results showed that both single-phenotype and multi-phenotype analysis were able to detect the joint association when there was an interaction effect between SNPs. Furthermore, multi-phenotype analysis attained increased power over single-phenotype analysis. The power improvement was achieved with/without the interaction effect. In terms of SNP selection, multi-phenotype analysis had improved sensitivity and specificity over single-phenotype analysis for all scenarios.

Simulation IV: Varying Number of Phenotypes

The simulation results are summarized in Table . The results showed that the power decreased slightly as the number of noise phenotypes increased. In terms of SNP selection, both sensitivity and specificity decreased when the number of noise phenotypes increased. In summary, our simulations have shown that: 1) Compared to the analysis of single phenotype with forward U-test, the analysis of multiple phenotypes with stepwise U-test has increased power to detect the association, especially when the phenotypes share relatively large genetic causes (e.g. more shared SNPs, larger effect size of shared SNPs). 2) Stepwise U-test has an increased the probability to detect shared SNPs, but a reduced probability to detect SNPs that are only causal to a particular phenotype. 3) Similar to forward U-test, stepwise U-test is able to detect the joint association when there are genetic interactions between genetic variants. 4) The performance of stepwise U-test remains robust in the presence of noise phenotypes.

Application to a Nicotine Dependence (ND) Dataset

We illustrated the proposed stepwise U-test with an application to a dataset from the Study of Addiction: Genetics and Environment (SAGE). The SAGE study is part of the Gene Environment Association Studies initiative (GENEVA) funded by the National Human Genome Research Institute. The SAGE samples were selected from three large complementary datasets: the Family Study of Cocaine Dependence (FSCD), the Collaborative Study on the Genetics of Alcoholism (COGA), and the Collaborative Genetic Study of Nicotine Dependence (COGEND) [29]. All samples in SAGE were unrelated and have quantitative measurements of various phenotypes for additions, such as alcohol, nicotine, marijuana, cocaine, opiates and other drugs. In this article, we focused on three ND-related phenotypes, including participant’s lifetime score on Fagerström Test for Nicotine Dependence (ftnd_total), number of cigarettes smoked per day (ftnd_4), and number of nicotine symptoms endorsed (nic_sx_tot). We evaluated the joint association between three phenotypes and 155 SNPs that were reported for their potential association with ND. Because the SAGE study only had the genotypes of 128 SNPs, we further imputed the genotype of the other 27 SNPs by using PLINK [30]. Our study population was mainly biracial, and we used HapMap phase III founders of CEU (Utah residence with Northern and Western European ancestry) and ASW (African ancestry in Southwest USA) as the reference panels for the Caucasian and African American subjects respectively [31]. We applied stepwise U-test to samples of COGEND for an initial association analysis and to samples of FSCD and COGA for the replication analysis. The results are summarized in Table . Based on the initial dataset COGEND, the analysis identified two SNPs, rs10508649 and rs2491397, joint associated with three ND-related phenotypes, with a nominal P-value of 3.79e-13. By using permutation, the empirical p-value of the association reached the significance level of 0.001. This association remained to be significant in both FSCD (P-value=2.37e-05) and COGA (P-value=7.46e-05). For comparison purposes, we also conducted single-phenotype analyses by using forward U-test. The findings of single-phenotype analyses varied among three phenotypes. Based on the initial dataset of COGEND, 1) the analysis of the lifetime FTND score (ftnd_total) identified the same two SNPs with the multi-phenotype analyses; 2) the analysis of the number of cigarettes smoked per day (ftnd_4) revealed a different SNP, rs2036527; 3) the analysis of the number of nicotine symptoms endorsed (nic_sx_tot) found two SNPs, rs10508649 and rs7517376, one of which overlapped with the SNPs identified from the multi-phenotype analyses. All of the findings from single-phenotype analyses showed significant associations in the initial data COGEND. However, these associations could not be replicated in either FSCD or COGA. This result indicated that the proposed multi-phenotype strategy might improve the testing power and obtain more robust findings over its single-phenotype alternative.

DISCUSSION

Complex diseases are thought to be influenced by the interplay of hundreds or even thousands genetic variants through complex mechanisms [32]. Multi-locus methods, taking genetic interactions into account, could have improved power to detect disease-susceptibility genetic variants. Furthermore, complex phenotypes, such as nicotine dependence, are commonly assessed by multiple measures that are complementary to each other [33-37]. For example, the two gold-standard measures of nicotine dependence, the FTND score and the Diagnostic and Statistical Manual of Mental Disorders (DSM), were found to have a relatively low concordance with a Kappa estimate of 0.2. [35, 38] It was suggested that the FTND and DSM measurements emphasis on physical symptoms and psychiatric symptoms, respectively, each of which reflects a unique aspect of ND development. Other studies have also pointed out that ND can be assessed through various aspects, including physical, behavioral and psychological components [36]. While it remains challenging to define a single comprehensive measurement for better characterizing complex phenotypes, such as ND, new statistical methods can be used to facilitate the genetic discovery process by taking advantage of currently available multiple phenotypes in the analysis. In this article, we proposed a stepwise U-test for testing the joint association between multiple loci and multiple phenotypes. Similar to the forward U-test developed for single-phenotype analyses, the proposed method is entirely non-parametric, which makes no assumption of the phenotype distribution and the underlying disease mechanisms (i.e. modes of inheritance). We conducted simulation studies to compare the performance of two testing strategies: single-phenotype analysis and multi-phenotype analysis. Our simulation results demonstrated that multi-phenotype analysis could have better performance than single-phenotype analysis, especially when phenotypes of interest have similar underlying genetic etiologies (e.g., share part of causal genetic variants). The better performance of multi-phenotype analysis can be explained by its capacity of capturing collective effect of genetic variants over all relevant phenotypes. When there are a significant number of shared SNPs contributed to these phenotypes, multi-phenotype analysis is expect to outperform single-phenotype analysis. In the article, we have focused on the association analysis of multiple quantitative phenotypes, by using a kernel function to measure the phenotype difference between two subjects. For dichotomous phenotypes, extension can be made by using a different kernel function [39]: where is the likelihood ratio of a particular genotype group. Similar to forward U-test for the analysis of single phenotype, stepwise U-test also adopted a forward search strategy to select disease-associated SNPs. Therefore, it is computationally feasible to apply stepwise U-test to a relatively large number of SNPs. The computational time will depend on various factors, such as the number of variants, the number of phenotypes and sample size. Under our simulation setting with 2 phenotypes, 10 SNPs and 1,000 samples, it took an average computation time of 128.4 (SD=71.6) seconds to run each replicate on a desktop with a single core of 2.90 GHz and 8 GB RAM. In the real data application, we identified and replicated the joint association of two SNPs, rs10508649 and rs2491397, with three ND phenotypes based on three independent datasets. The two SNPs are located in two genes, PIP4K2A and GABBR2, respectively. Gene GABAB2, known as Gamma-aminobutyric acid (GABA) B receptor 2, is a G-protein coupled receptor subunit that mediates inhibitory neurotransmitter in the central nervous system [40]. SNP rs2491397 was reported to be associated with the development of ND through haplotypes in the GABAR2 gene, [41] which was found to be associated with a number of measurements of ND, including the smoking quantity (SQ), the heaviness of smoking index (HSI), and the FTND score [42-44]. GABAB2 was also reported to be interacting with other genes, such as GABAB1, for a joint association with ND [45]. Moreover, PIP4K2A was found to be associated with other psychiatric disorders, such as schizophrenia [46-49]. SNP rs10508649 is located within PIP4K2A, and was found to be associated with ND outcome measured by FTND score [50]. In our study, the results also indicated that this SNP was potentially associated with other ND measurements, such as the number of symptoms endorsed. While it is biologically plausible that these two identified SNPs may be involved in a number of manifestations of ND, further studies are still needed to replicate the findings and investigate their effects on ND development.

Table 1

Type I error rates of the stepwise U-test for different numbers of phenotypes.

Number of Phenotypes	1 Phenotypes	2 Phenotypes	3 Phenotypes	4 Phenotypes	5 Phenotypes
Type I error	0.042	0.053	0.045	0.051	0.050

Table 2

Power comparison between single-phenotype analyses and multi-phenotype analyses when the number of shared SNPs varies.

Disease Model		Single-Phen⁴		Multi-Pheno⁵
Disease Model				(,)
y1=0.15x1+0.15x2+0.15x3+0.15x4+ε1y2=0.15x5+0.15x6+0.15x7+0.15x8+ε2	PowerSensitivity A¹Sensitivity B²Specificity³	0.5090.481--0.960	0.5030.474--0.956	0.5440.241--0.959
y1=0.15x1+0.15x2+0.15x3+0.15x4+ε1y2=0.15x1+0.15x5+0.15x6+0.15x7+ε2	PowerSensitivity ASensitivity BSpecificity	0.4890.4710.4860.957	0.5090.4710.4760.962	0.6260.2160.6280.967
y1=0.15x1+0.15x2+0.15x3+0.15x4+ε1y2=0.15x1+0.15x2+0.15x5+0.15x6+ε2	PowerSensitivity ASensitivity BSpecificity	0.5140.4890.4620.961	0.5130.4730.4790.961	0.7690.1940.6660.973
y1=0.15x1+0.15x2+0.15x3+0.15x4+ε1y2=0.15x1+0.15x2+0.15x3+0.15x5+ε2	PowerSensitivity ASensitivity BSpecificity	0.5270.4630.4760.959	0.5300.4910.4860.962	0.8800.1520.6750.974
y1=0.15x1+0.15x2+0.15x3+0.15x4+ε1y2=0.15x1+0.15x2+0.15x3+0.15x4+ε2	PowerSensitivity ASensitivity BSpecificity	0.491--0.4810.960	0.481--0.4660.955	0.903--0.6770.973

1Sensitivity A: the probability of selecting a causal SNP that influences only one phenotype

2Sensitivity B: the probability of selecting a causal SNP that influences both phenotypes

3Specificity: the probability of selecting a SNP that influences none of the phenotypes

4 single-phenotype analyses are conducted by using forward U-test

5 multi-phenotype analyses are conducted by using stepwise U-test

Table 3

Power comparison between single-phenotype analysis and multi-phenotype analysis when the effect sizes vary.

Disease Model		Single-Phen¹		Multi-Phen²
Disease Model
y1=0.1x1+0.1x2+ε1y2=0.1x1+0.1x3+ε2	PowerSensitivity ASensitivity BSpecificity	0.1910.4390.4560.903	0.1780.4730.4620.900	0.3530.2600.6000.913
y1=0.1x1+0.2x2+ε1y2=0.1x1+0.2x3+ε2	PowerSensitivity ASensitivity BSpecificity	0.3200.9440.2530.954	0.3400.9290.2380.952	0.4490.5050.2810.959
y1=0.2x1+0.1x2+ε1y2=0.2x1+0.1x3+ε2	PowerSensitivity ASensitivity BSpecificity	0.3300.2460.9400.950	0.3350.2330.9410.962	0.8230.0870.9960.971
y1=0.2x1+0.2x2+ε1y2=0.2x1+0.2x3+ε2	PowerSensitivity ASensitivity BSpecificity	0.6450.8450.8320.960	0.6500.8230.8460.954	0.9270.3710.9160.971

1 single-phenotype analysis is conducted by using forward U-test

2 multi-phenotype analysis is conducted by using stepwise U-test

Table 4

Power comparison between single-phenotype analysis and multi-phenotype analysis by varying underlying disease models.

Disease Model		Single-Phen¹		Multi-Phen²
Disease Model
y1=0.1x1+0.1x2+ε1y2=0.1x1+0.1x2+ε2	PowerSensitivitySpecificity	0.1670.4530.900	0.1810.4690.904	0.4040.5820.933
y1=0.1x1+0.1x2+ε1y2=0.2I(x1>0)I(x2>0)+ε2	PowerSensitivitySpecificity	0.1650.4570.904	0.1240.3780.883	0.3430.5260.916
y1=0.1x1+0.1x2+ε1y2=0.1x1+0.1x2+0.05x1x2+ε2	PowerSensitivitySpecificity	0.1620.4470.900	0.2800.5670.930	0.5120.6600.945
y1=0.1x1+0.1x2+0.05x1x2+ε1y2=0.3×I(x1>0)I(x2>0)+ε2	PowerSensitivitySpecificity	0.3140.5820.930	0.3340.6000.934	0.7220.7440.953

1 single-phenotype analysis is conducted by using forward U-test

2 multi-phenotype analysis is conducted by using stepwise U-test

Table 5

Performance of multi-phenotype analysis with varying number of noise phenotypes.

Disease Model		2 Pheno	+1 noise	+2 noise	+3 noise
y1=0.2x1+0.2x3+ε1y2=0.2x2+0.2x3+ε2	PowerSensitivity ASensitivity BSpecificity	0.9270.3710.9160.971	0.9220.3480.8960.963	0.9060.3440.8920.956	0.8520.3390.8810.950
y1=0.1x1+0.1x2+0.05x1x2+ε1y2=0.3×I(x1>0)I(x2>0)+ε2	PowerSensitivitySpecificity	0.7220.7440.953	0.7160.6760.965	0.6530.6630.951	0.5700.6380.938

Table 6

Summary of multi-phenotype analysis and single-phenotype analysis of three independent datasets, COGEND, FSCD and COGA.

Phenotype	SNP	Allele		Gene	Grouping	P-values
Multiple Phenotype Analyses
3 Phenotypes	rs10508649rs2491397	C/TC/T		PIP4K2AGABBR2	TT or CC/CTCC or CT/TT	COGEND: 3.79e-13FSCD: 2.37e-05COGA: 7.46e-05
Single Phenotype Analyses
FTND_4	rs2036527	A/G	CHRNA5		AA or AG/GG	COGEND: 3.06e-05FSCD: 0.180COGA: 0.219
FTND_total	rs10508649rs2491397	C/TC/T	PIP4K2AGABBR2		TT or CC/CTCC or CT/TT	COGEND: 1.39e-07FSCD: 0.228COGA: 0.066
Nic_sx_tot	rs10508649rs7517376	C/TA/G	PIP4K2AFMO1		TT or CC/CTAA or AG/GG	COGEND: 4.92e-07FSCD: 0.056COGA: 0.526

47 in total

Review 1. Detecting epistatic interactions contributing to quantitative traits.

Authors: Robert Culverhouse; Tsvika Klein; William Shannon
Journal: Genet Epidemiol Date: 2004-09 Impact factor: 2.135

2. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.

Authors: Lucia A Hindorff; Praveen Sethupathy; Heather A Junkins; Erin M Ramos; Jayashri P Mehta; Francis S Collins; Teri A Manolio
Journal: Proc Natl Acad Sci U S A Date: 2009-05-27 Impact factor: 11.205

3. Personal genomes: The case of the missing heritability.

Authors: Brendan Maher
Journal: Nature Date: 2008-11-06 Impact factor: 49.962

4. Evaluating the heritability explained by known susceptibility variants: a survey of ten complex diseases.

Authors: Hon-Cheong So; Allen H S Gui; Stacey S Cherny; Pak C Sham
Journal: Genet Epidemiol Date: 2011-03-03 Impact factor: 2.135

5. Failure to support the validity of the Fagerstrom Tolerance Questionnaire as a measure of physiological tolerance to nicotine.

Authors: T W Lombardo; J R Hughes; J D Fross
Journal: Addict Behav Date: 1988 Impact factor: 3.913

6. Detecting genetic interactions for quantitative traits with U-statistics.

Authors: Ming Li; Chengyin Ye; Wenjiang Fu; Robert C Elston; Qing Lu
Journal: Genet Epidemiol Date: 2011-05-26 Impact factor: 2.135

7. A likelihood ratio-based Mann-Whitney approach finds novel replicable joint gene action for type 2 diabetes.

Authors: Qing Lu; Changshuai Wei; Chengyin Ye; Ming Li; Robert C Elston
Journal: Genet Epidemiol Date: 2012-07-03 Impact factor: 2.135

8. Single- and multilocus allelic variants within the GABA(B) receptor subunit 2 (GABAB2) gene are significantly associated with nicotine dependence.

Authors: Joke Beuten; Jennie Z Ma; Thomas J Payne; Randolph T Dupont; Karen M Crews; Grant Somes; Nancy J Williams; Robert C Elston; Ming D Li
Journal: Am J Hum Genet Date: 2005-03-09 Impact factor: 11.025

9. Association study of NRG1, DTNBP1, RGS4, G72/G30, and PIP5K2A with schizophrenia and symptom severity in a Hungarian sample.

Authors: János M Réthelyi; Steven C Bakker; Patrícia Polgár; Pál Czobor; Eric Strengman; Péter I Pásztor; René S Kahn; István Bitter
Journal: Am J Med Genet B Neuropsychiatr Genet Date: 2010-04-05 Impact factor: 3.568

10. Multilocus association testing of quantitative traits based on partial least-squares analysis.

Authors: Feng Zhang; Xiong Guo; Hong-Wen Deng
Journal: PLoS One Date: 2011-02-03 Impact factor: 3.240

1 in total

1. System network analysis of genomics and transcriptomics data identified type 1 diabetes-associated pathway and genes.

Authors: Jun-Min Lu; Yuan-Cheng Chen; Zeng-Xin Ao; Jie Shen; Chun-Ping Zeng; Xu Lin; Lin-Ping Peng; Rou Zhou; Xia-Fang Wang; Cheng Peng; Hong-Mei Xiao; Kun Zhang; Hong-Wen Deng
Journal: Genes Immun Date: 2018-09-24 Impact factor: 2.676

1 in total