Literature DB >> 25152751

Extension of the generalized disequilibrium test to polytomous phenotypes and two-locus models.

Alexandre Bureau¹, Jordie Croteau², Yvon C Chagnon², Marc-André Roy³, Michel Maziade³.

Abstract

WE EXTEND THE USUAL LOGISTIC MODEL BETWEEN A DICHOTOMOUS PHENOTYPE AND AN ALLELE COUNT IN TWO WAYS: a polytomous phenotype with K > 2 levels, and modeling of allele counts at two unlinked marker loci. Inference is based on within-family information to guard against potential bias due to population genetic structure. Score tests of the model coefficients taking into account the correlation between relatives in entire pedigrees are derived as an extension of the Generalized Disequilibrium Test (GDT). Simulations confirm that the tests have the expected statistical properties, and that their power exceeds that of the GDT under a favorable scenario. The score tests are illustrated with candidate genetic markers, a major psychosis phenotype and a cognitive endophenotype in large kindreds from Eastern Quebec.

Entities: Chemical Disease Gene Mutation Species

Keywords: conditional likelihood; endophenotype; family-based association; kinship; major psychosis; polytomous logistic model; score test

Year: 2014 PMID： 25152751 PMCID： PMC4126369 DOI： 10.3389/fgene.2014.00258

Source DB: PubMed Journal: Front Genet ISSN： 1664-8021 Impact factor: 4.599

1. Introduction

Studies of the association between a phenotype and genetic markers are commonly performed on the members of families of various sizes. While methods to estimate association parameters and test the null hypothesis of absence of association (possibly coupled with absence of genetic linkage) with dichotomous phenotypes in family samples are well developed (see for instance chapter 12 of Ziegler and König, 2010), methods are lacking to analyze polytomous phenotypes. Such phenotypes can arise when a disease has multiple subtypes (Guey et al., 2010) or when two dichotomous phenotypes are considered simultaneously. The latter occurs when endophenotypes are measured in genetic studies to better capture phenotypic complexity. Endophenotypes are traits related to a disease and believed to be influenced by fewer genes (Gottesman and Gould, 2003). A dichotomous disease status and a dichotomous endophenotype create a four category phenotype. Comparisons between analyzing a polytomous phenotype vs. a dichotomous one have not been done for family studies due to the lack of analysis methods for polytomous phenotypes. We focus in this paper on a within-family analysis, conditional on phenotype and genotype observed in each family. Such approach is well known to protect against confounding due to population stratification. Families where multiple phenotypic categories are represented provide the most information on the relationship between a polytomous phenotype and genetic markers. Since families extending over multiple generations typically need to be recruited to obtain a large number of phenotyped subjects, we required that the methods for dichotomous traits that we generalize to polytomous traits be applicable to extended families. For a score test of association, we selected the Generalized disequilibrium test (GDT) of Chen et al. (2009). In previous work, we showed by simulation that conditioning on a marker at a known disease susceptibility locus increased power to detect linkage to new loci interacting with that disease susceptibility locus (Bureau et al., 2009, 2012). Similar power gains are expected in association analysis, as conditioning on a known environmental risk factor increases power to detect loci interacting with the exposure (Kraft et al., 2007). Models involving genetic markers at two distinct loci are needed for analyses conditional on the genotype of known disease susceptibility markers and also to model the relationship between pairs of loci. Multi-category phenotypes present a larger realm of possibilities of interplay between multiple loci than dichotomous traits, making multilocus modeling even more important to capture the actual effects. This is why we derive score tests under two-locus models, with one marker at each locus, in addition to one-locus models. The Type I error and the power of tests of various combinations of regression coefficients are assessed using simulation. The tests are also illustrated with candidate genetic markers, a major psychosis phenotype and a cognitive endophenotype in the Eastern Quebec kindred study.

2. Methods

We extend the GDT of Chen et al. (2009) in two ways: by allowing the outcome Y to have K > 2 levels, and by allowing the odds of the outcome categories to depend on two or more variables X, coding the genotype of markers at two mutually unlinked marker loci. As in the original GDT, X represents the count of a particular form of the DNA sequence at the marker, called allele. We begin by deriving the score statistic from the conditional likelihood for a polytomous outcome Y with a general vector X of allelic count terms (possibly including product terms). Then we derive expressions for particular forms of terms in X. The polytomous model for subject i with a general X vector can be written where X is the sub-vector of X containing the allelic terms related to level (category) k and the sub-vector of the full coefficient vector applicable to level k (in this general formulation, β coefficients can either be distinct for each level k or can be common to multiple levels of k). Without loss of generality, we assume that the n genotyped pedigree members with an observed phenotype are ordered such that the first n1 subjects are in outcome category Y = 1, the n2 following subjects are in outcome category Y = 2 and so on up to the last n subjects with Y = K. With K = 2 and a single X (1 = β a scalar, without covariates), Chen et al. (2009) showed that the contribution of the family to the score statistic from the conditional likelihood P to test the null hypothesis β = 0 has the form: We show in Supplementary Material that the contribution of a family to the score statistic for the coefficient β component of when testing the global null hypothesis that the full = 0 under a polytomous model is: where E = {1,···,n − (n − n), n−n + 1,···,n} and X is the slice of X related to the coefficient βh. If βh is involved only in the logistic function between levels a and K, then the score statistic simplifies to: where E = {n + ··· + n + 1, ···, n+ ··· + n} for a > 1 and E1 = {1 ··· n1}. The advantage of expression 3 is that a closed-form expression for the variance of S and the covariance of S and S for coefficients β and β can be derived, following the steps of Chen et al. It is also easier to interpret. When the tested coefficient belongs to the logistic function attached to a single outcome category and the score statistic reduces to expression 4, it is a contrast of the value of the corresponding X term between subjects in the outcome category and subjects in all other categories. Letting v[S] be an estimate of the variance-covariance matrix of S, the null hypothesis that = 0 can then be tested with the statistic which follows a χ2 distribution with degrees of freedom equal to the rank of under the null. When testing the sub null hypothesis β = ··· = β = 0 for any subset of indices h1, ···, h, the other coefficients are free to differ from 0 and the derivation in Supplementary Material no longer applies. We adopt here the approach Chen et al. (2009) apply to model covariates, which is to weight the pairwise differences according to a model of the outcome Y as a function of the predictors with free coefficients under the null hypothesis. The score statistic for the component β of the subset of coefficients tested then becomes where the weights C can be derived from score equations for β under the pairwise formulation of Liang and Stewart (1987) (see Supplementary Material), leading to the following functions of the coefficients of a polytomous logistic model of Y as a function of the predictors X, c = {l : l ∉ (h1,···,h)} when the variability from estimating the is neglected: where = 0. Adapting Chen et al. (2009)'s Equation 2 from the dichotomous to the polytomous case gives the following expression for the weights instead: We estimate the coefficients using generalized estimating equations (GEEs) with an independence working correlation matrix. With this approach the null hypothesis that the component β = 0 can be tested with the statistic which follows approximately a standard normal distribution under the null, when the weights are defined in such a way that the expectation of S is 0. The weight definition will only have an impact on power. The joint null hypothesis β = ··· = β = 0 for any subset of indices h1, ···, h can be tested with the statistic which follows approximately a χ2 distribution with m degrees of freedom under the null. The variance of S depends on whether the null hypothesis refers only to absence of association, or to absence of genetic linkage and association. In the first case, the null distribution of S allows genetic linkage at the locus, and the identical-by-descent (IBD) sharing proportions in the variance estimate must be the actual IBD sharing proportions at the locus π (Chen et al., 2009). For the second case, or when IBD is unknown, π can be substituted by twice the kinship coefficients ϕ, which is constant at all loci. The general expression for the variance of S and covariance between S and S is given in Supplementary Material. When S takes the form 4, X is a main effect term, say X1, and the actual IBD sharing proportions π are used then The within-family variance of X1, σ21, is estimated as described in Supplementary Material to obtain the estimate v[S of Var[S. With equal weights for all pairs, the computation involving the IBD sharing probabilities can be simplified as explained in Supplementary Material. When X is instead a product term, say X1X, then where the within-family variance of the product term X1 X2, σ212, is estimated as described in Supplementary Material.

2.1. Application to the joint modeling of two dichotomous traits using two-locus models

The joint analysis of two dichotomous traits represents an important special case of a polytomous phenotype with four categories. We illustrate such a phenotype by referring to a dichotomous disease trait Y2 and a dichotomous endophenotype Y1, as defined in the introduction. We consider here polytomous models for two markers at unlinked loci which may interact to cause the disease and endophenotype impairment. We assume that association of locus 1 to the endophenotype impairment Y1 = 1 and possibly to the disease Y2 = 1 has already been established, and that we want to detect locus 2, which is undetectable in single-locus analyses, by conditioning on locus 1 with which it interacts. This leads to null hypotheses on a subset of coefficients tested with a statistic as defined in Equation 8. A first option is to use the full model with distinct coefficients for each disease/endophenotype combination contrasted to the reference category of absence of both the disease and endophenotype impairment. This model is: The null hypothesis of the conditional test of locus 2 given locus 1 under the full model is formulated as: When the null is rejected, insights on the phenotype category driving the signal can be obtained by examining the Z statistics for each coefficient and the p-values associated to the tests of the subsets of coefficients (β12, β13),(β22, β23) and (β32, β33). One can also postulate a model for a particular form of interaction between the two loci. We consider a model which we call the endophenotype-to-disease model where an allele at locus 1 increases susceptibility to the endophenotype impairment Y1 = 1 and possibly to the disease Y2 = 1, and an allele at locus 2 increases susceptibility to the disease in carriers of gene 1 susceptibility genotypes (at higher risk of the endophenotype impairment). For that model we express allele counts as proportion of a given allele in a genotype, taking values 0, and 1. The model is then written as: We keep the same notation for the coefficients as in the full model, except for the coefficient β, which represents the effect on the risk of the endophenotype impairment in non-carriers of the locus 2 tested allele. When the endophenotype-to-disease model holds, the coefficients β33 and β are of the same sign. The marginal association of X2 to the endophenotype impairment under that model will typically be small. Its direction and magnitude depend on the values of β33 and β and the distribution of X1. The null hypothesis of the conditional test of locus 2 given locus 1 under the above model is formulated as: The alternative hypothesis can be restricted to or a general alternative can be considered, but the alternative space then contains models outside of the conceptual model formulated above. Alternatively, detection of locus 2 can be attempted by testing a single interaction parameter between X1 and X2, as in the context of a genetic analysis conditional on an environmental exposure (Kraft et al., 2007). Here the interaction parameter for the logistic function contrasting the disease and endophenotype impairment category to the reference category β33 is the most promising to test to detect effects on the disease and endophenotype impairment jointly.

2.2. Software implementation

We have implemented the extension of the GDT to polytomous phenotypes and two loci in the R package fat2Lpoly, standing for Family-based Association Test for 2 Loci and Polytomous phenotypes available on the CRAN archive at CRAN.R-project.org/package = fat2Lpoly. A function is provided to read phenotype and genotype data, variable names and IBD sharing proportions (if applicable) from input files in the Merlin/QTDT format (www.sph.umich.edu/csg/abecasis/Merlin/tour/input_files.html) and convert them into R objects. Alternatively, R objects made by the user in the same format can be provided as input. Functions are provided to setup design matrices for the full two-locus polytomous model, the one-locus polytomous model and the disease-to-endophenotype model. User-defined functions setting-up customized design matrices can be provided instead of these pre-defined functions.

2.3. Evaluation by simulation of the proposed hypothesis tests under two-locus models

The family structure used in the simulations is a 3-generation 16-member family depicted in Figure 1. The disease and endophenotype status of all family members was assumed to be observed. We generated genotype data for genetic variants with two alleles such as single nucleotide polymorphisms (SNPs) at two independent loci. The genotypes of pedigree founders were sampled under Hardy-Weinberg equilibrium using risk allele frequencies (RAFs) of 0.1 at locus 1 and 0.3 at locus 2. The transmission of alleles to their descendants was then simulated following the rules of Mendelian inheritance. Two dichotomous phenotypes Y1 and Y2 were generated in a two-step approach: we first simulated from the distribution of Y for each subject i by summing over Y in a polytomous model, then from the distribution of the vector Y2|Y1. In the model to simulate Y|Y1, Y1 is treated as a vector of fixed effect, with the effect of the endophenotype of subject h, Y, modulated by the kinship coefficient ϕ between i and h. An additive polygenic effect on the logit of Y2 was also included. The model can be written:

Figure 1

Structure of simulated families with an example of phenotype realization.

where γ′(X, Y) in an abbreviated expression of the model for the disease phenotype given the genotype at major loci and endophenotype status of subject i derived from a polytomous model and Φ is the kinship matrix between the family members. The parameter σ2 controls the degree of polygenic dependence between the disease status Y2 of the family members and the parameter α the degree of genetic dependance of Y2 on Y1 not captured by the genotype at the loci in the model. The parameter ν, between 0 and 1, determines the relative importance of the risk increase 1 − ν due to observing an endophenotype impairment and the risk decrease −ν due to observing the normal level of the endophenotype in a relative. We note this simulation scheme is meant to reproduce the association between disease phenotype and endophenotype status of relatives, not to represent a causal mechanism. Among the simulated families, we kept those with at least a cousin pair with Y2 = 1, i.e., affected by the disease to mimic the ascertainment process of families in a genetic study. Structure of simulated families with an example of phenotype realization. We simulated two scenarios of population origin of the sample: (1) homogeneity: the sample came from a single population where the phenotypes were generated under the polytomous model presented in Table 1. Under this models and with the above RAFs, the disease had a population prevalence of 0.0076 and the endophenotype impairment a prevalence of 0.128; (2) heterogeneity: the sample was a mixture of families from two populations, both represented in equal proportions. In population 1, all intercept coefficients in Table 1 were reduced by 0.5, while in population 2 they were increased by 0.5. This resulted in disease prevalences of 0.005 in population 1 and 0.012 in population 2, and endophenotype impairment prevalences of 0.082 in population 1 and 0.194 in population 2.

Table 1

Regression coefficients of the example polytomous model.

Coef.	Value	Coef.	Value	Coef.	Coef.	Value
β₁₀	−2	β₁₁	log(2)	β₁₂	β₁₃	−log(2)
β₂₀	−5.5	β₂₁	0	β₂₂	β₂₃	0
β₃₀	−5.5	β₃₁	0	β₃₂	β₃₃	log(16)

Regression coefficients of the example polytomous model. To verify the Type I error of tests of association to locus 2 under the null hypothesis of no association to locus 2, but in presence of genetic linkage at that locus, we generated an additional biallelic variant at locus 2 independent from the causal variant at that locus, i.e., in linkage equilibrium with it. In the homogeneous population, the minor allele frequency of that marker was equal to the RAF of the causal variant, but in the mixture of two populations the minor allele frequency was 0.1 in population 1 and 0.5 in population 2, creating population structure at that locus. For the power evaluation, we tested association to the actual causal variant at locus 2. The tests evaluated include the tests of the null hypotheses 11 which we denote “cpoly,” 13 which we denote “(β, β33),” and β33 = 0. We also evaluated a single locus polytomous model (model 11 with X2 only). The coefficients in that model are labeled , and we tested the null hypotheses = 0 as well as β3(1L) = 0. For the evaluation of the Type I error, Wald tests of the coefficients of the one locus model based on GEEs were also performed. However, these tests were not used for the power comparison, since they had inflated Type I error under our heterogeneity scenario where population stratification was present. In presence of population stratification, previously available valid tests are restricted to a dichotomous outcome and a single marker. Analysis options are then limited to testing association of a single marker to the dichotomous endophenotype Y1 and disease status Y2, either in the full sample or, in the case of Y2, in a stratum defined by Y1. This is akin to the strategy for detecting modifier genes conferring susceptibility to a specific phenotype (i.e., the disease) consisting in testing association to the specific phenotype among subjects with a broader phenotype (i.e., the endophenotype impairment) (Bureau et al., 2012). We therefore compared the power of various tests derived under our extension of the GDT against the single marker GDT for dichotomous outcomes applied to the locus 2 causal variant with three phenotype definitions: (1) the disease status Y2 (standard analysis noted simply GDT), (2) the disease status Y2 in the subset of subjects with Y1 = 1 (endophenotype impairment), setting the phenotype of other subjects to unknown (GDTc), and (3) the endophenotype status Y1 (GDTe). We also compared our tests to score tests of coefficients of the usual two-locus logistic model for a dichotomous trait: The 2 d.f. test of the null hypothesis η2 = η3 = 0 is denoted “cdisease” when the phenotype tested is Y2 and “cendo” when the phenotype tested is Y1.

3. Results

3.1. Evaluation of the type I error

The Type I error was evaluated on 1000 replicate samples of 100 families. The results of the simulation under the null hypothesis in Table 2 show that the nominal Type I error rate was respected under both scenarios for all test statistics from our polytomous extension of the GDT. The Type I error rates of the tests conditional on locus 1 were similar for weight definitions 6 and 7, so only results for the former are shown. They were both below the nominal level, making these tests conservative. By contrast, the Type I error of the Wald tests based on GEE estimates were at nominal level only under the homogeneous sample scenario, and were severely inflated under the heterogeneous sample scenario.

Table 2

Estimations of Type I error on 1000 replicate samples of 100 families.

	GEE		Conditional likelihood
	Single locus		Single locus		Given other locus^a
	β₃(1L)	β˜(1L)	β₃(1L)	β˜(1L)	β₃₃	(β_e,β₃₃)	cpoly
HOMOGENEOUS POPULATION
α = 0.01:	0.009	0.015	0.006	0.012	0.001	0.001	0.007
α = 0.05:	0.053	0.060	0.051	0.045	0.019	0.003	0.029
MIXTURE OF TWO POPULATIONS
α = 0.01:	0.102	0.762	0.012	0.010	0.002	0.001	0.007
α = 0.05:	0.237	0.906	0.048	0.053	0.025	0.003	0.033

subject pairs were weighted using expression 6.

Estimations of Type I error on 1000 replicate samples of 100 families. subject pairs were weighted using expression 6.

3.2. Evaluation of the power

Under the simulated scenario the endophenotype-to-disease model holds. While the test of the null hypothesis 13 has some power, testing β33 = 0 (the interaction parameter for the combination of disease and endophenotype impairment) achieves the highest power among the tests considered (Figure 2). Using weight definition 7 instead of 6 led to nearly identical power (results not shown). Under this scenario, testing association for the same phenotypic category of the allele count at locus 2 β3(1L) = 0 or the entire vector = 0 does not provide a measurable power improvement over the GDT applied to the disease status in the subset of subjects with endophenotype impairment. Further comparisons of testing strategies under a variety of scenarios will be reported elsewhere.

Figure 2

Power of various within-family score tests to detect locus 2. See text for definitions of the acronyms of the tests. For tests conditional on another locus, subject pairs were weighted using expression 6.

3.3. Application to major psychosis and visual episodic memory

Schizophrenia (SZ) and bipolar disorder (BP) are two forms of the spectrum of major psychosis (MP), which also includes schizo-affective disorder. SZ and BP co-aggregate in families (Van Snellenberg and de Candia, 2009), and share genetic liability (Cross-Disorder Group of the Psychiatric Genomics Consortium, 2013). Various cognitive domains are widely recognized as endophenotypes of MP (Bora et al., 2009; Ivleva et al., 2010). In the Eastern Quebec kindred study, visual episodic memory (VisEM) was found to be impaired in both SZ and BP patients and non-affected adult relatives of these patients (Maziade et al., 2011). In that same family sample, we recently replicated an association between the T allele of SNP rs1156026 and SZ that we had previously detected in another sample (Bureau et al., 2013). All the elements required for the application of our extension of the GDT to markers genotyped in the family sample are present: a diagnosis within the spectrum of MP as the disease phenotype, a VisEM mesurement dichotomized as presence/absence of deficit as the endophenotype and the SNP rs1156026 as the established risk locus. Given the small number of subjects with cognitive measurements, this analysis is not sufficiently powered to draw conclusions and must be considered illustrative. The small sample size also limited us to an analysis of MP globally, without separating SZ and BP. VisEM was measured by the performance on the delayed recall of the Rey figure task (Meyers and Meyers, 1995) defining the affected status as being the 4th percentile of the distribution of age and gender matched controls. We retained the 14 informative families defined as containing at least one MP affected subject with a visual memory measurement and subjects in at least one other phenotypic category. Table 3 presents the joint distribution of MP and VisEM in the 133 genotyped subjects from these families along with the frequency of the rs1156026 T allele. Although the frequency of the T allele is greatly increased in subjects with MP and the VisEM impairment compared to normal subjects (and this increase is statistically significant in a population-level comparison) the within-family score test of the corresponding coefficient has a high p-value, suggesting that the difference in T allele frequency is mostly between families and not so much within families.

Table 3

Joint distribution of major psychosis and visual episodic memory deficits along with the frequency of the rs1156026 T allele.

		VisEM <= 4th perc				VisEM > 4th perc				Total
		n₁	Freq T	p_GEE^a	p_1L^b	n₀	Freq T	p_GEE^a	p_1L^b	n_.	Freq T
MP	Yes	21 (41%)	0.52	0.0011	0.34	30	0.40	0.040	0.310	51 (38%)	0.45
	No	13 (16%)	0.31	0.97	0.72	69	0.30			82 (62%)	0.30
	Total	34 (26%)	0.44			99	0.33			133	0.36

p-values of Wald tests of the coefficients of the one locus polytomous model estimated using generalized estimating equations (GEE).

p-values of within-family score tests of the coefficients of the one locus polytomous model.

Joint distribution of major psychosis and visual episodic memory deficits along with the frequency of the rs1156026 T allele. p-values of Wald tests of the coefficients of the one locus polytomous model estimated using generalized estimating equations (GEE). p-values of within-family score tests of the coefficients of the one locus polytomous model. We tested association to 80 SNPs in genomic regions where genetic linkage to SZ, BP, or MP was previously detected in that family sample on the p arm of chromosomes 6, 8, and 16 and the q arm of chromosomes 12 and 18 (Maziade et al., 2005). We applied the same tests as in the simulation study. SNPs where a p-value < 0.05 was obtained in at least one analysis are shown in Table 4.

Table 4

Results for SNPs where a .

SNP	Chr	Pos (Mb)		MAF (n)
				Y₁ = 0, Y₂ = 0	Y₁ = 0, Y₂ = 1	Y₁ = 1, Y₂ = 0	Y₁ = 1, Y₂ = 1
rs1087266	6	24.4		0.39 (42)	0.26 (25)	0.60 (5)	0.55 (19)
rs7500550	16	19.1		0.11 (41)	0.16 (25)	0.17 (6)	0.03 (18)
TESTS p-VALUES
SNP	GDT	GDTc	GDTe	β˜(1L)	β₃₃	(β_e,β₃₃)	cpoly
rs1087266	0.48	0.25	0.005	0.005	0.032	0.085	0.006
rs7500550	0.52	0.17	0.57	0.040	0.019	0.064	0.015

For tests conditioning on rs1156026 genotypes, subject pairs were weighted using expression 6.

Results for SNPs where a . For tests conditioning on rs1156026 genotypes, subject pairs were weighted using expression 6. The results for rs7500550 illustrate that tests of the joint MP-VisEM phenotype conditional on the rs1156026 T allele count can detect associations to SNPs where the test of the MP or VisEM phenotype alone did not. In this case, the rare allele was negatively associated to MP with VisEM impairment with Z statistics of −2.66 for the X2 and −2.34 for the X1X2 terms (p = 0.0019 for the test of the coefficients of both terms) while it was positively associated to a lesser extent to MP without VisEM impairment with Z statistics of 2.54 for the X2 and 2.07 for the X1X2 terms (p = 0.005 for the test of the coefficients of both terms). The signal was thus driven by opposite associations to these two phenotypic categories. The signal at rs1087266 was detected by single locus tests with lower p-values than by tests conditioning on rs1156026. In that case, testing association with VisEM status was the key to detect the signal. Nonetheless, the conditional test of the polytomous phenotype provides a p-value similar to the standard GDT. Given the limited power of the analysis and the number of SNPs tested, these results cannot be considered statistically significant once multiple testing is taken into account.

4. Discussion

We have extended the GDT, a score test of genetic association applicable with extended families, to enable testing association with a polytomous phenotype. Another extension is the use of a model of association with two genetic loci, allowing to test association at a locus conditional on the genotype of a marker at a known risk locus, to exploit interaction between the two. A software implementation in the form of a R package has been made freely available. The within-family analysis framework that we adopted has the advantage of protecting against Type I error inflation due to population stratification. Polytomous phenotypes can be more informative than dichotomous ones to detect genetic associations, as illustrated in our simulation study. The proposed score tests also suffer from limitations. First, score tests provide no estimates of the regression parameters being tested. Conditional maximum likelihood estimation would be applicable only with exchangeable relatives, which is not required for the GDT as explained in Supplementary Material. We are exploring the robustness and power of conditional maximum likelihood estimation in sibships from extended families. Second, within-family analysis tends to be less powerful than population-level analysis which also exploits between family information. Furthermore, the Type I error of score tests for one locus conditionning on another tends to be conservative even with the weight definition 6 neglecting variability from estimating the . Our simulation studies illustrate that power remains limited despite large sample sizes (1600 subjects in 100 families) and large effect sizes (interaction odds ratios of 16). Extracting the most power from the data is particularly important when phenotypic measures are expensive to obtain, such as the cognitive measurements in our example. Population analyses are then attractive, with an adjustment for population structure using genomewide SNP genotypes (Price et al., 2006). Methods for population analysis of polytomous phenotypes are not well developed, and will be the object of future work.

Author contributions

Alexandre Bureau defined the research questions, derived the proposed statistical test, wrote part of the R implementation, conceived the simulation study, oversaw the analysis of the major psychosis data and drafted the manuscript. Jordie Croteau wrote part of the R implementation, performed the simulation study and the analysis of the major psychosis data, and created figures and tables. Yvon C. Chagnon oversaw the genotyping of the Eastern Quebec kindred study and contributed to the design of the genetic aspects of that study. Marc-André Roy contributed to the design of the genetic and clinical aspects of the Eastern Quebec kindred study, established diagnosis of patients and made substantial revisions to the manuscript. Michel Maziade designed the genetic and clinical aspects of the Eastern Quebec kindred study, oversaw clinical data collection and established diagnosis of patients. All authors approved the version submitted for publication and agree to be accountable for all aspects of the work.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

15 in total

1. Principal components analysis corrects for stratification in genome-wide association studies.

Authors: Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal: Nat Genet Date: 2006-07-23 Impact factor: 38.330

2. Follow-up of a major psychosis linkage site in 13q13-q14 reveals significant association in both case-control and family samples.

Authors: Alexandre Bureau; Yvon C Chagnon; Jordie Croteau; Alain Fournier; Marc-André Roy; Thomas Paccalet; Chantal Mérette; Michel Maziade
Journal: Biol Psychiatry Date: 2013-04-18 Impact factor: 13.382

3. Polychotomous logistic regression methods for matched case-control studies with multiple case or control groups.

Authors: K Y Liang; W F Stewart
Journal: Am J Epidemiol Date: 1987-04 Impact factor: 4.897

Review 4. The endophenotype concept in psychiatry: etymology and strategic intentions.

Authors: Irving I Gottesman; Todd D Gould
Journal: Am J Psychiatry Date: 2003-04 Impact factor: 18.112

5. Shared and specific susceptibility loci for schizophrenia and bipolar disorder: a dense genome scan in Eastern Quebec families.

Authors: M Maziade; M-A Roy; Y C Chagnon; D Cliche; J-P Fournier; N Montgrain; C Dion; J-C Lavallée; Y Garneau; N Gingras; L Nicole; A Pirès; A-M Ponton; A Potvin; H Wallot; C Mérette
Journal: Mol Psychiatry Date: 2005-05 Impact factor: 15.992

6. Genetic susceptibility to distinct bladder cancer subphenotypes.

Authors: Lin T Guey; Montserrat García-Closas; Cristiane Murta-Nascimento; Josep Lloreta; Laia Palencia; Manolis Kogevinas; Nathaniel Rothman; Gemma Vellalta; M Luz Calle; Gaëlle Marenne; Adonina Tardón; Alfredo Carrato; Reina García-Closas; Consol Serra; Debra T Silverman; Stephen Chanock; Francisco X Real; Núria Malats
Journal: Eur Urol Date: 2009-08-12 Impact factor: 20.096

7. Cognitive endophenotypes of bipolar disorder: a meta-analysis of neuropsychological deficits in euthymic patients and their first-degree relatives.

Authors: Emre Bora; Murat Yucel; Christos Pantelis
Journal: J Affect Disord Date: 2008-08-05 Impact factor: 4.839

8. Meta-analytic evidence for familial coaggregation of schizophrenia and bipolar disorder.

Authors: Jared X Van Snellenberg; Teresa de Candia
Journal: Arch Gen Psychiatry Date: 2009-07

9. Verbal and visual memory impairments among young offspring and healthy adult relatives of patients with schizophrenia and bipolar disorder: selective generational patterns indicate different developmental trajectories.

Authors: Michel Maziade; Nancie Rouleau; Chantal Mérette; Caroline Cellard; Marco Battaglia; Cecilia Marino; Valérie Jomphe; Elsa Gilbert; Amélie Achim; Roch-Hugo Bouchard; Thomas Paccalet; Marie-Eve Paradis; Marc-André Roy
Journal: Schizophr Bull Date: 2010-04-21 Impact factor: 9.306

10. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs.

Authors: S Hong Lee; Stephan Ripke; Benjamin M Neale; Stephen V Faraone; Shaun M Purcell; Roy H Perlis; Bryan J Mowry; Anita Thapar; Michael E Goddard; John S Witte; Devin Absher; Ingrid Agartz; Huda Akil; Farooq Amin; Ole A Andreassen; Adebayo Anjorin; Richard Anney; Verneri Anttila; Dan E Arking; Philip Asherson; Maria H Azevedo; Lena Backlund; Judith A Badner; Anthony J Bailey; Tobias Banaschewski; Jack D Barchas; Michael R Barnes; Thomas B Barrett; Nicholas Bass; Agatino Battaglia; Michael Bauer; Mònica Bayés; Frank Bellivier; Sarah E Bergen; Wade Berrettini; Catalina Betancur; Thomas Bettecken; Joseph Biederman; Elisabeth B Binder; Donald W Black; Douglas H R Blackwood; Cinnamon S Bloss; Michael Boehnke; Dorret I Boomsma; Gerome Breen; René Breuer; Richard Bruggeman; Paul Cormican; Nancy G Buccola; Jan K Buitelaar; William E Bunney; Joseph D Buxbaum; William F Byerley; Enda M Byrne; Sian Caesar; Wiepke Cahn; Rita M Cantor; Miguel Casas; Aravinda Chakravarti; Kimberly Chambert; Khalid Choudhury; Sven Cichon; C Robert Cloninger; David A Collier; Edwin H Cook; Hilary Coon; Bru Cormand; Aiden Corvin; William H Coryell; David W Craig; Ian W Craig; Jennifer Crosbie; Michael L Cuccaro; David Curtis; Darina Czamara; Susmita Datta; Geraldine Dawson; Richard Day; Eco J De Geus; Franziska Degenhardt; Srdjan Djurovic; Gary J Donohoe; Alysa E Doyle; Jubao Duan; Frank Dudbridge; Eftichia Duketis; Richard P Ebstein; Howard J Edenberg; Josephine Elia; Sean Ennis; Bruno Etain; Ayman Fanous; Anne E Farmer; I Nicol Ferrier; Matthew Flickinger; Eric Fombonne; Tatiana Foroud; Josef Frank; Barbara Franke; Christine Fraser; Robert Freedman; Nelson B Freimer; Christine M Freitag; Marion Friedl; Louise Frisén; Louise Gallagher; Pablo V Gejman; Lyudmila Georgieva; Elliot S Gershon; Daniel H Geschwind; Ina Giegling; Michael Gill; Scott D Gordon; Katherine Gordon-Smith; Elaine K Green; Tiffany A Greenwood; Dorothy E Grice; Magdalena Gross; Detelina Grozeva; Weihua Guan; Hugh Gurling; Lieuwe De Haan; Jonathan L Haines; Hakon Hakonarson; Joachim Hallmayer; Steven P Hamilton; Marian L Hamshere; Thomas F Hansen; Annette M Hartmann; Martin Hautzinger; Andrew C Heath; Anjali K Henders; Stefan Herms; Ian B Hickie; Maria Hipolito; Susanne Hoefels; Peter A Holmans; Florian Holsboer; Witte J Hoogendijk; Jouke-Jan Hottenga; Christina M Hultman; Vanessa Hus; Andrés Ingason; Marcus Ising; Stéphane Jamain; Edward G Jones; Ian Jones; Lisa Jones; Jung-Ying Tzeng; Anna K Kähler; René S Kahn; Radhika Kandaswamy; Matthew C Keller; James L Kennedy; Elaine Kenny; Lindsey Kent; Yunjung Kim; George K Kirov; Sabine M Klauck; Lambertus Klei; James A Knowles; Martin A Kohli; Daniel L Koller; Bettina Konte; Ania Korszun; Lydia Krabbendam; Robert Krasucki; Jonna Kuntsi; Phoenix Kwan; Mikael Landén; Niklas Långström; Mark Lathrop; Jacob Lawrence; William B Lawson; Marion Leboyer; David H Ledbetter; Phil H Lee; Todd Lencz; Klaus-Peter Lesch; Douglas F Levinson; Cathryn M Lewis; Jun Li; Paul Lichtenstein; Jeffrey A Lieberman; Dan-Yu Lin; Don H Linszen; Chunyu Liu; Falk W Lohoff; Sandra K Loo; Catherine Lord; Jennifer K Lowe; Susanne Lucae; Donald J MacIntyre; Pamela A F Madden; Elena Maestrini; Patrik K E Magnusson; Pamela B Mahon; Wolfgang Maier; Anil K Malhotra; Shrikant M Mane; Christa L Martin; Nicholas G Martin; Manuel Mattheisen; Keith Matthews; Morten Mattingsdal; Steven A McCarroll; Kevin A McGhee; James J McGough; Patrick J McGrath; Peter McGuffin; Melvin G McInnis; Andrew McIntosh; Rebecca McKinney; Alan W McLean; Francis J McMahon; William M McMahon; Andrew McQuillin; Helena Medeiros; Sarah E Medland; Sandra Meier; Ingrid Melle; Fan Meng; Jobst Meyer; Christel M Middeldorp; Lefkos Middleton; Vihra Milanova; Ana Miranda; Anthony P Monaco; Grant W Montgomery; Jennifer L Moran; Daniel Moreno-De-Luca; Gunnar Morken; Derek W Morris; Eric M Morrow; Valentina Moskvina; Pierandrea Muglia; Thomas W Mühleisen; Walter J Muir; Bertram Müller-Myhsok; Michael Murtha; Richard M Myers; Inez Myin-Germeys; Michael C Neale; Stan F Nelson; Caroline M Nievergelt; Ivan Nikolov; Vishwajit Nimgaonkar; Willem A Nolen; Markus M Nöthen; John I Nurnberger; Evaristus A Nwulia; Dale R Nyholt; Colm O'Dushlaine; Robert D Oades; Ann Olincy; Guiomar Oliveira; Line Olsen; Roel A Ophoff; Urban Osby; Michael J Owen; Aarno Palotie; Jeremy R Parr; Andrew D Paterson; Carlos N Pato; Michele T Pato; Brenda W Penninx; Michele L Pergadia; Margaret A Pericak-Vance; Benjamin S Pickard; Jonathan Pimm; Joseph Piven; Danielle Posthuma; James B Potash; Fritz Poustka; Peter Propping; Vinay Puri; Digby J Quested; Emma M Quinn; Josep Antoni Ramos-Quiroga; Henrik B Rasmussen; Soumya Raychaudhuri; Karola Rehnström; Andreas Reif; Marta Ribasés; John P Rice; Marcella Rietschel; Kathryn Roeder; Herbert Roeyers; Lizzy Rossin; Aribert Rothenberger; Guy Rouleau; Douglas Ruderfer; Dan Rujescu; Alan R Sanders; Stephan J Sanders; Susan L Santangelo; Joseph A Sergeant; Russell Schachar; Martin Schalling; Alan F Schatzberg; William A Scheftner; Gerard D Schellenberg; Stephen W Scherer; Nicholas J Schork; Thomas G Schulze; Johannes Schumacher; Markus Schwarz; Edward Scolnick; Laura J Scott; Jianxin Shi; Paul D Shilling; Stanley I Shyn; Jeremy M Silverman; Susan L Slager; Susan L Smalley; Johannes H Smit; Erin N Smith; Edmund J S Sonuga-Barke; David St Clair; Matthew State; Michael Steffens; Hans-Christoph Steinhausen; John S Strauss; Jana Strohmaier; T Scott Stroup; James S Sutcliffe; Peter Szatmari; Szabocls Szelinger; Srinivasa Thirumalai; Robert C Thompson; Alexandre A Todorov; Federica Tozzi; Jens Treutlein; Manfred Uhr; Edwin J C G van den Oord; Gerard Van Grootheest; Jim Van Os; Astrid M Vicente; Veronica J Vieland; John B Vincent; Peter M Visscher; Christopher A Walsh; Thomas H Wassink; Stanley J Watson; Myrna M Weissman; Thomas Werge; Thomas F Wienker; Ellen M Wijsman; Gonneke Willemsen; Nigel Williams; A Jeremy Willsey; Stephanie H Witt; Wei Xu; Allan H Young; Timothy W Yu; Stanley Zammit; Peter P Zandi; Peng Zhang; Frans G Zitman; Sebastian Zöllner; Bernie Devlin; John R Kelsoe; Pamela Sklar; Mark J Daly; Michael C O'Donovan; Nicholas Craddock; Patrick F Sullivan; Jordan W Smoller; Kenneth S Kendler; Naomi R Wray
Journal: Nat Genet Date: 2013-08-11 Impact factor: 38.330

3 in total

1. On the validity of within-nuclear-family genetic association analysis in samples of extended families.

Authors: Alexandre Bureau; Thierry Duchesne
Journal: Stat Appl Genet Mol Biol Date: 2015-12

2. Polyunphased: an extension to polytomous outcomes of the Unphased package for family-based genetic association analysis.

Authors: Alexandre Bureau; Jordie Croteau
Journal: Stat Appl Genet Mol Biol Date: 2017-03-01

3. Estimating genetic effect sizes under joint disease-endophenotype models in presence of gene-environment interactions.

Authors: Alexandre Bureau; Jordie Croteau; Christian Couture; Marie-Claude Vohl; Claude Bouchard; Louis Pérusse
Journal: Front Genet Date: 2015-07-28 Impact factor: 4.599

3 in total