Literature DB >> 32130208

Disentangling group specific QTL allele effects from genetic background epistasis using admixed individuals in GWAS: An application to maize flowering.

Simon Rio¹, Tristan Mary-Huard^1,2, Laurence Moreau¹, Cyril Bauland¹, Carine Palaffre³, Delphine Madur¹, Valérie Combes¹, Alain Charcosset¹.

Abstract

When handling a structured population in association mapping, group-specific allele effects may be observed at quantitative trait loci (QTLs) for several reasons: (i) a different linkage disequilibrium (LD) between SNPs and QTLs across groups, (ii) group-specific genetic mutations in QTL regions, and/or (iii) epistatic interactions between QTLs and other loci that have differentiated allele frequencies between groups. We present here a new genome-wide association (GWAS) approach to identify QTLs exhibiting such group-specific allele effects. We developed genetic materials including admixed progeny from different genetic groups with known genome-wide ancestries (local admixture). A dedicated statistical methodology was developed to analyze pure and admixed individuals jointly, allowing one to disentangle the factors causing the heterogeneity of allele effects across groups. This approach was applied to maize by developing an inbred "Flint-Dent" panel including admixed individuals that was evaluated for flowering time. Several associations were detected revealing a wide range of configurations of allele effects, both at known flowering QTLs (Vgt1, Vgt2 and Vgt3) and new loci. We found several QTLs whose effect depended on the group ancestry of alleles while others interacted with the genetic background. Our GWAS approach provides useful information on the stability of QTL effects across genetic groups and can be applied to a wide range of species.

Entities: Disease Gene Species

Year: 2020 PMID： 32130208 PMCID： PMC7075643 DOI： 10.1371/journal.pgen.1008241

Source DB: PubMed Journal: PLoS Genet ISSN： 1553-7390 Impact factor: 5.917

Introduction

Quantitative traits are genetically determined by numerous regions of the genome, also known as quantitative trait loci (QTLs). The advent of high density genotyping of single nucleotide polymorphisms (SNPs) has opened the way to the identification of QTLs in diversity panels. These studies, referred to as genome-wide association studies (GWAS), use the linkage disequilibrium (LD) between the SNPs and causal variants at QTLs underlying the traits of interest. The panels evaluated in GWAS often include sets of individuals with complex pedigrees or genetic structure [1]. The latter is a common feature in human, animal and plant species and arises when groups of individuals cease to mate with each other and start to be subjected to different evolutionary forces, such as drift or selection [2]. Applying GWAS in a diversity panel including individuals from different groups raises the issue of spurious associations. The stratification of a population into genetic groups generates LD between loci that are differentiated between groups but not necessarily genetically linked. When a given trait is characterized by contrasted group-specific means, all these SNPs will correlate to it and may be detected as false positives. An efficient control of these spurious associations can be done by taking structure and kinship into account in the statistical model [1, 3]. This procedure will however limit the statistical power at differentiated SNPs, making them difficult to detect in multi-group GWAS, especially in case of rare alleles [4]. In a structured population, group-specific allele effects can be observed at SNPs, and testing an overall effect using a standard GWAS model may not be effective if the QTL effect is of opposite sign in the different groups. Such effects can result from group differences in LD between SNPs and QTLs across genetic groups. A different LD extent or linkage phase between linked loci can be explained by specific dynamics of population size such as bottlenecks or expansions [5, 6]. Such patterns of LD were identified in numerous species including human [7, 8], dairy and beef cattle [9, 10], pig [11], wheat [12] and maize [13-16]. A genetic mutation appearing in a QTL region may also lead to group-specific allele effects if it occurred in a founder specific of the genetic group. Several Mendelian syndromes of obesity were shown to result from mutation within specific ethnicities in human [17]. Another possibility consists in QTLs interacting with other loci that have differentiated allele frequencies between groups (i.e. interacting with the genetic background). In human, this possibility was discussed for a candidate gene associated with a higher risk of myocardial infarction in African American than in European populations [18, 19]. Another example is a SNP in the promoter region of HNF4A gene which was associated with a higher risk of developing type 2 diabetes in Askenazi compared to United Kingdom populations [20]. This locus was later proven to be interacting with another gene in the Askenazi population [21]. In maize, evidences of QTLs with group-specific allele effects can also be found, even though the cause of these differences remains unclear. The presence of allelic series has been demonstrated for QTLs associated with flowering time, including Vgt1 [22]. A QTL with group-specific allele effects was also identified in a maize diversity panel for a phenology trait [23]. More generally, studying the stability of QTL allele effects across genetic backgrounds is an important issue. In human, it determines the ability of a genetic marker to predict the predisposition of an individual to develop a genetic disease across ethnic groups. In plant or animal breeding, it conditions the success of introgressing a favorable allele coming from a source of diversity into an elite genetic material. Different GWAS strategies were adopted to address this issue depending on the species. In human, GWAS mostly focused on a specific genetic group, and these group-specific studies were compared later through meta-analyses [24, 25]. Some of these meta-analyses revealed highly conserved effects between populations [26, 27] while other put in evidence more differences [28]. In dairy cattle, the first GWAS studies focused on a specific breed [29-31]. More recently, multi-breed GWAS were conducted to refine QTLs locations by taking advantage of the low LD extent observed in such composite populations [32-34]. In maize, the possibility to use seeds from different origins and generations led geneticists to assemble GWAS panels with a broad range of genetic materials [35-37]. These panels often include a limited proportion of admixed individuals that were derived from crosses between individuals from different genetic groups. The genomes of these admixed individuals consist in mosaics of fragments with different ancestries. Admixture events are a common feature in living species and can contribute to the successful colonization of new environments [38, 39]. In plants, innovative admixed genetic materials were created to enable high statistical power of QTL detection along with a wide spectrum of genetic diversity studied, such as nested association mapping (NAM) [40] or multi-parent advanced generation inter-cross (MAGIC) [41]. Both NAM and MAGIC populations are of great interest to study the stability of QTL effects in a wide range of genetic backgrounds. However, they generally include a limited number of founders and do not address the stability of QTL allele effects across genetic groups. This study aimed at evaluating the interest of producing admixed individuals, derived from a large set of parents, in order to decipher the genetic architecture of a trait using innovative GWAS models. The objectives were (i) to demonstrate the interest of multi-group analyses to identify new QTLs, (ii) to highlight the interest of applying multi-group GWAS models to identify group-specific allele effects at QTLs and (iii) to show how admixed individuals can help to disentangle the factors causing the heterogeneity of allele effects across groups: local genomic differences or epistatic interactions between QTLs and the genetic background. To our knowledge, no method has been proposed in the literature to address the last objective. This method was applied to a maize inbred population evaluated for flowering traits, including dent, flint and admixed lines. Maize flowering time is an interesting trait to analyze in quantitative genetics studies. It is considered as a major adaptive trait by tailoring vegetative and reproductive growth phases to local environmental conditions.

Materials and methods

Genetic material and genotypic data

Genetic material consisted in a panel of 970 maize inbred lines assembled within the “Amaizing” project. It gathered 300 dent lines, 304 flint lines and 366 admixed doubled haploids, further referred to as admixed lines. The dent lines were those included in the “Amaizing Dent” panel [42] and the flint lines were those included in the “CF-Flint” panel [16]. The dent and flint lines aimed at representing the diversity of their respective heterotic group used in European breeding and included several breeding generations. The admixed lines were derived from 206 hybrids between flint and dent lines, mated according to a sparse factorial design (Fig 1), followed by in situ gynogenesis [43] to produce fixed admixed inbred lines. Each dent or flint line was involved in 0 to 11 hybrids (1.21 in average), each leading to 1 to 4 admixed lines (1.77 in average). In total, 171 dent lines and 172 flint lines were involved as parents of admixed lines.

Fig 1

Diagram of admixed lines production from hybrids obtained by mating dent and flint lines according to a sparse factorial design.

All the flint and dent lines were genotyped using the 600K Affymetrix Maize Genotyping Array [44]. Residual heterozygous data was treated as missing and all missing values were imputed independently within each group using Beagle v.3.3.2 and default parameters [45]. The few heterozygous genotypic datapoints imputed by Beagle (0.00084% of all datapoints) were randomly assigned to homozygous genotypes. The admixed lines were genotyped with a 15K chip provided by the private company Limagrain which included a reduced set of SNPs from the 50K Illumina MaizeSNP50 BeadChip [46]. Eight check lines were genotyped with both 600K and 15K genotyping technologies to standardize the reference alleles (0/1) on the set of shared SNPs between the 600K and 15K datasets (9,015 SNPs). Admixed lines were then imputed to 600K SNPs using the following procedure, illustrated in S1 Fig. The positions of recombination breakpoints and the parental origins of the alleles for admixed lines were determined with the set of 9,015 shared SNPs. SNPs for which parental lines carry different alleles allowed us to identify the parental line that transmitted its allele to its admixed progeny. For a given admixed line, changes of parental origins of alleles along a given chromosome indicated the location of recombination breakpoints. A smoothing of parental allele origins was performed for the few SNPs indicating discordant information with respect to the chromosome block in which they were located. In this case, we considered the underlying genotypic datapoint as missing. Parental origins of alleles in admixed lines were imputed up to 600K using adjacent SNP information. If a set of SNPs to be imputed was located within a recombination interval, the new position of the breakpoint was positioned at half of that ordered set, according to the physical position of the SNPs along the chromosome (average proportions of SNPs located within such intervals was 0.93% for a given admixed individual). Alleles at SNPs were then imputed based on their origin using parental genotypic data. The MITE associated with the flowering QTL Vgt1 [47, 48] was also genotyped for all the individuals (0: absence, 1: presence). There was a total of 482,013 polymorphic SNPs in this dataset, for which we had information for each individual concerning the SNP allele (0/1), its ancestry (dent/flint) and the genetic background (dent/flint/admixed) in which it was observed. The dent genome proportion of the admixed lines ranged from 0.16 to 0.86 with a mean equal to 0.51 (S2 Fig). Possible selection biases were studied along the genome by comparing the observed allele frequencies with the expected allele frequencies given the pedigree. No major pattern was observed, suggesting no or minor selection biases among the admixed lines (S3 Fig). A PCoA was performed on genetic distances computed as D = 1 − K, with K being the kinship coefficient between lines l and l′ computed following Eq (2)—see below—assuming a common genetic background for all individuals, i.e. using an average frequency of allele 1 at each locus. The flint and dent lines are clearly distinguished on the two principal coordinates, with a small overlapping region in the center of the graph, while the admixed lines fill the genetic space between the two groups (Fig 2). The same PCoA calculated using the set of 9,015 shared SNPs between the 600K and 15K datasets showed a very similar structure pattern on the first two axes, as shown in S4 Fig.

Fig 2

PCoA on genetic distances with coloration of individuals depending on their genetic background: dent, flint or admixed.

LD between pairs of loci was estimated separately in the dent and the flint datasets using the square correlation r2 between loci pairs. We only considered SNPs for which at least ten individuals carried the minor allele in both dent and flint datasets. For each group, LD was calculated and averaged for sets of loci pairs characterized by a similar physical distance ranging from 0 to 2 Mbp, considering a sliding window of 1Kbp. The inter-group LD comparison revealed a higher LD extent in the dent than in the flint genetic group (S5 Fig), which was consistent with previous studies [13-16]. As suggested by [9], the persistence of LD linkage phases across flint and dent genetic groups was evaluated by computing the correlation between the r estimated in each group, along the same sliding window of 1Kbp. We also studied the consistency of LD linkage phases between groups by computing the correlation between their signs in the two groups, giving a value of “0” and “1” for a negative and a positive r, respectively. LD phases were very consistent over short physical distances but began to diverge dramatically when the loci were distant by more than 100-200 Kbp (S6 Fig).

Phenotypic data

All the lines were evaluated per se at Saint-Martin-de-Hinx (France) in 2015 and 2016 for male flowering (MF) and female flowering (FF), in calendar days after sowing. Each trial was a latinized alpha design where every line was evaluated two times on average. Field trials were divided into two blocks of 33 sub-blocks each comprising 36 plots. To avoid competition between genetic backgrounds, dent, flint and admixed lines were sown in different sub-blocks. Three check lines were repeated in all sub-blocks (B73, F353 and UH007). Each plot consisted in a row of 25 plants. MF and FF were measured as a median value within the whole plot. The contribution of Genotype x Environment (GxE) interactions to the phenotypic variance and the level of broad-sense heritability were investigated using the following model: where Y is the phenotype, μ is the intercept, β is the fixed effect of trial j, α is the fixed effect of genetic background k (dent, flint, admixed, or the different checks: B73, F353 and UH007), G is the random genotype effect of line l in genetic background k (not for checks) with being the genotypic variance in genetic background k, (G × β) is the random GxE interaction of line l in genetic background k for trial j, with being the GxE variance in the genetic background k for trial j, E is the error with being the error variance for trial j, X and Z are the row and column random effects in trial j, respectively, as defined by the field design. All random effects are independent of each other. The row and column effects were modeled as independent or using an autoregressive model (AR1), as determined based on the AIC criterion (S1 Table). Least squares means (), further referred to as phenotypes (Y), were computed over the whole design using the same model, with genotypes as fixed effects: where γ is the fixed genotype effect of line l in genetic background k. Model parameters were estimated using ASReml-R and restricted maximum likelihood (ReML) [49].

General polygenic model

In this study, the following general polygenic model was considered: where Y is the phenotype (least squares mean) of line l in genetic background k among the N individuals of the sample, μ is the intercept, α is the genetic background effect with k ∈ {D, F, A} for dent, flint and admixed genetic background, respectively, G is the random genetic value of the line with being the concatenated vector of the genetic values in each genetic background where , is the kinship matrix between individuals from genetic background k and k′ computed following Eq (2), is the genetic variance in genetic background k, is the genetic covariance between genetic background k and k′, E is the error associated with line l in genetic background k with independent and identically distributed, and is the error variance. The kinship between lines l from genetic background k and l′ from genetic background k′, K, was computed following [50]: where W is the genotype of line l at locus m coded 0/1 and f is the frequency of allele 1 at locus m in genetic background k. Note that Eq (2) simplifies to the kinship estimator proposed by [51] when l and l′ belong to the same genetic background.

GWAS models

In this study, three GWAS models were applied to different population samples (Table 1). The GWAS strategies were (i) to analyze dent and flint lines separately using a standard GWAS model M, (ii) to analyze dent and flint lines jointly using a GWAS model M accounting for allele ancestry (confounded with the genetic background) and (iii) to analyze dent, flint and admixed lines using a GWAS model M accounting for both allele ancestry and the genetic background of the individuals. All models aimed at detecting a SNP effect, defined as a contrast effect between alleles 0 and 1 at a given SNP.

Table 1

Population sample to which each GWAS model was applied with the corresponding number of SNPs conserved for the analysis (at least 10 individuals carrying the minor allelic state).

	Dent	Flint	Dent + Flint	Dent + Flint + Admixed
M₁	✔ (247,759)	✔ (282,278)	✘	✘
M₂	-	-	✔ (288,093)	✘
M₃	-	-	-	✔(256,951)

✔: model was applied to the sample

✘: model was not applied to the sample but can theoretically be, provided the addition of a genetic background effect

- : model cannot be applied to the sample or would simplify into another model

Note that the number of SNPs in multi-group GWAS (M and M) is higher than the minimum number of SNPs in single group GWAS (M (Dent)). SNPs carrying redundant information within a single group were indeed reduced to a single SNP for M and may no longer carry redundant information when datasets are pooled (M and M)

✔: model was applied to the sample ✘: model was not applied to the sample but can theoretically be, provided the addition of a genetic background effect - : model cannot be applied to the sample or would simplify into another model Note that the number of SNPs in multi-group GWAS (M and M) is higher than the minimum number of SNPs in single group GWAS (M (Dent)). SNPs carrying redundant information within a single group were indeed reduced to a single SNP for M and may no longer carry redundant information when datasets are pooled (M and M)

Standard GWAS model M1

The first GWAS model M [1] was applied separately to the dent and flint datasets. For each SNP among the M loci, one has: where is the effect of the SNP allele i at locus m (Table 2). All other terms are identical to those appearing in Eq (1), and the kinship was computed following Eq (2) which simplifies to the kinship estimator proposed by [51]. The existence of a SNP effect was tested using hypothesis .

Table 2

Allelic states observed in each GWAS model, resulting from a combination of SNP alleles, their ancestry and the genetic background in which they are observed.

	SNP	Ancestry	Genetic background	Allelic states
M₁	{0, 1}	-	-	{0, 1}
M₂	{0, 1}	{D, F}^a	-	{0D, 1D, 0F, 1F}
M₃	{0, 1}	{D, F}	{D, A, F}	{0DD, 1DD, 0DA, 1DA, 0FA, 1FA, 0FF, 1FF}

0: SNP reference allele

1: SNP alternative allele

D: Dent ancestry or genetic background

F: Flint ancestry or genetic background

A: Admixed genetic background

a confounded with the genetic background

0: SNP reference allele 1: SNP alternative allele D: Dent ancestry or genetic background F: Flint ancestry or genetic background A: Admixed genetic background a confounded with the genetic background

Multi-group GWAS model M2

We applied a multi-group GWAS model M jointly to the flint and dent datasets, specifying the allele ancestry (confounded with the genetic background). For a given SNP m, one has: where is the effect of the SNP allele i with ancestry j at locus m, as defined in Table 2. All other terms are identical to those appearing in Eq (1). At a given SNP, the following hypotheses were tested: Hypotheses and test the existence of a dent and a flint SNP effect, respectively. Hypothesis tests for a general SNP effect while tests for a divergent SNP effect between the dent and flint ancestries.

Multi-group GWAS model M3

We applied a multi-group GWAS model M jointly to the flint, dent and admixed datasets, specifying the allele ancestry and the genetic background of the individual. For a given SNP m, one has: where is the effect of the SNP allele i with ancestry j at locus m in genetic background k, as defined in Table 2. All other terms are identical to those appearing in Eq (1). At a given SNP, 16 hypotheses were tested (Table 3). Hypotheses referred to as “simple” (, , and ) were tested to identify QTLs with a significant SNP effect for each combination of ancestries and genetic backgrounds. For instance, tests whether a dent SNP effect (differential effect between alleles 0 and 1 of dent ancestry) is significant in the admixed genetic background. Hypotheses referred to as “general” (, , , and, ) were used to identify QTLs with a mean SNP effect over ancestries and genetic backgrounds. For instance, tests for a general flint SNP effect in the flint and the admixed genetic backgrounds and tests for a general SNP effect over ancestries and genetic backgrounds. Hypotheses referred to as “divergent” (, , , , , , , , ) were tested to identify QTLs with a contrasted SNP effect between ancestries and/or genetic backgrounds. For instance, tests for a divergent dent SNP effect between the dent and the admixed genetic backgrounds, which amounts to testing an epistatic interaction between the SNP and the genetic background (see S1 Appendix for details).

Table 3

Linear combinations tested with M3 compared to hypotheses tested using other GWAS models (M1 and M2).

	Type	ΔDDm ^a	ΔDAm ^b	ΔFAm ^c	ΔFFm ^d	M₁	M₂
ΔDDm	simple	+1	0	0	0	✔	✔
ΔDAm	simple	0	+1	0	0	-	-
ΔFAm	simple	0	0	+1	0	-	-
ΔFFm	simple	0	0	0	+1	✔	✔
ΔDD+FFm	general	+1	0	0	+1	-	✔
ΔDD+DAm	general	+1	+1	0	0	-	-
ΔFF+FAm	general	0	0	+1	+1	-	-
ΔDA+FAm	general	0	+1	+1	0	-	-
ΔDD+DA+FA+FFm	general	+1	+1	+1	+1	-	-
ΔDD−FFm	divergent	+1	0	0	-1	-	✔
ΔDD−DAm	divergent	+1	-1	0	0	-	-
ΔFF−FAm	divergent	0	0	-1	+1	-	-
ΔDA−FAm	divergent	0	+1	-1	0	-	-
Δ(DD+DA)−(FF+FA)m	divergent	+1	+1	-1	-1	-	-
Δ(DD+FF)−(DA+FA)m	divergent	+1	-1	-1	+1	-	-
Δ(DD−DA)−(FF−FA)m	divergent	+1	-1	+1	-1	-	-

✔: hypothesis also tested using the corresponding GWAS model

- : hypothesis not tested using the corresponding GWAS model

a b c d ✔: hypothesis also tested using the corresponding GWAS model - : hypothesis not tested using the corresponding GWAS model On a biological standpoint, a QTL with contrasted SNP effects between groups can be caused by (i) a local genomic difference due to a group-specific genetic mutation for all or part of the lines and/or to group differences in LD or (ii) an interaction with the genetic background. Under the first hypothesis, one expects that the effect of a SNP depends on its ancestry but not on the genetic background (admixed or pure, see Fig 3a). Under the second hypothesis, we expect a SNP effect, for a given ancestry, to vary depending on the genetic background. One example would be a QTL with a strong SNP effect in a dent genetic background, but none in the flint genetic background, while the SNP effects would be of intermediate size for alleles of both ancestries in the admixed genetic background (see Fig 3b). Note that other complex configurations are possible, justifying the inclusion of all tests in the analysis.

Fig 3

Schematic of allele effects when divergent SNP effects are observed between groups, depending on the biological hypothesis: (a) local genomic difference between groups (LD or mutation) and (b) allele effects interacting with the genetic background.

The denomination of the allelic states on the x-axis include the SNP allele (0/1), its ancestry (D/F) and the genetic background in which it is observed (D/A/F), as presented in Table 2.

Schematic of allele effects when divergent SNP effects are observed between groups, depending on the biological hypothesis: (a) local genomic difference between groups (LD or mutation) and (b) allele effects interacting with the genetic background.

The denomination of the allelic states on the x-axis include the SNP allele (0/1), its ancestry (D/F) and the genetic background in which it is observed (D/A/F), as presented in Table 2. For the three GWAS models, a SNP was discarded if its minor allelic state, as defined in Table 2, was carried by less than 10 individuals, or if it carried a redundant genetic information (genetic information identical to that of another SNP already included in the dataset). To avoid prohibitive computational times, a two-step strategy was adopted for the inference of models M and M. In a first step, the parameters of the “null” model of Eq (1) were estimated. The variance parameters were then plugged into their respective covariance matrices in order to derive a genetic covariance matrix and an error covariance matrix . In a second step, a model was fitted that included SNP fixed effects, as defined in M (or M), and two random effects (one genetic effect and one error effect) with covariance matrices and , respectively. Note that this strategy corresponds to fitting M (or M) while keeping some variance ratios fixed to their respective values obtained in the “null” model. Model parameters were estimated using ReML and the linear combinations of fixed effects were tested using Wald tests, both implemented in the R-package MM4LMM [52]. P-values were computed using the asymptotic null distribution of the Wald statistic, as presented in [4]. The false discovery rate (FDR) was controlled by applying the procedure of [53] jointly to the whole set of tests defined by each GWAS strategy, and repeatedly for each trait. All GWAS strategies were evaluated for their ability to control type I error and for their statistical power, using simulated phenotypes. Results are presented in S2 Appendix. In general, all models correctly controlled for false positives, and a higher power was observed for multi-group models, notably due to their ability to identify QTLs with complex configurations of effects. For a given hypothesis tested, significant SNPs were clustered into QTLs if they were located within a physical window of 3 Mbp, leading to a LD below 0.05 between markers of different QTLs.

Results

Associations detected and comparison of GWAS strategies

We observed a substantial phenotypic variability within the dent, flint and admixed genetic backgrounds for both traits. The variance components estimated in the phenotypic analysis are summarized in S1 Table. GxE variances were limited and the broad sense heritabilities were high for each genetic background, ranging from 0.88 in the admixed lines to 0.96 in the dent and flint lines for both MF and FF. The model parameters estimated using the general polygenic model of Eq (1) are presented in S2 Table and showed a larger genetic variance in the dent compared to the flint and admixed genetic backgrounds. For each GWAS model, two levels of FDR were used: 5% and 20% to declare a SNP as significantly associated. The number of significant SNPs detected and the corresponding number of QTLs were summarized in Table 4 for both traits. The location of QTLs detected using a FDR of 20% was represented along the genome in Fig 4 for MF and in S7 Fig for FF. All associations are listed in S3 and S4 Tables. Note that some SNPs were declared significant by a model (e.g. M) but were discarded with another model (e.g. M) because of the filtering on the frequency of each allelic state.

Table 4

Number of SNPs associated with each trait, depending on the GWAS strategy, using a FDR of 5% and 20%.

The number of corresponding QTLs is also indicated.

	MF				FF
	5%		20%		5%		20%
	SNP	QTL	SNP	QTL	SNP	QTL	SNP	QTL
M₁ ^a	7	2	56	24	8	3	38	14
Δ^m (Dent)	4	1	35	12	4	1	22	6
Δ^m (Flint)	3	1	21	13	4	2	16	8
M₂ ^a	6	2	10	5	6	2	9	5
ΔDm	4	1	5	4	4	1	4	1
ΔFm	2	1	4	2	2	1	4	3
ΔD+Fm	1	1	3	2	2	1	2	1
ΔD−Fm	-	-	-	-	-	-	1	1
M₃ ^a	3	2	56	17	-	-	13	5
ΔDDm	1	1	41	1	-	-	4	1
ΔDAm	-	-	1	1	-	-	-	-
ΔFAm	2	1	9	1	-	-	1	1
ΔFFm	-	-	1	1	-	-	-	-
ΔDD+FFm	-	-	9	3	-	-	3	2
ΔDA+DDm	-	-	5	3	-	-	-	-
ΔFF+FAm	-	-	3	2	-	-	-	-
ΔDA+FAm	-	-	11	4	-	-	1	1
ΔDD+DA+FA+FFm	-	-	19	5	-	-	16	1
ΔDD−FFm	-	-	6	1	-	-	-	-
ΔDD−DAm ^b	-	-	-	-	-	-	-	-
ΔFF−FAm ^b	-	-	2	2	-	-	-	-
ΔDA−FAm	-	-	4	4	-	-	-	-
Δ(DD+DA)−(FF+FA)m	-	-	2	2	-	-	-	-
Δ(DD+FF)−(DA+FA)m ^b	-	-	-	-	-	-	1	1
Δ(DD−DA)−(FF−FA)m ^b	-	-	1	1	-	-	-	-

a number of SNPs detected over the set of tests (a given SNP can be detected using different tests)

b hypothesis testing an interaction between the QTL and the genetic background

Fig 4

Position of QTLs detected with (a) M1, (b) M2 and (c) M3 for MF using a FDR of 20%.

Number of SNPs associated with each trait, depending on the GWAS strategy, using a FDR of 5% and 20%.

The number of corresponding QTLs is also indicated. a number of SNPs detected over the set of tests (a given SNP can be detected using different tests) b hypothesis testing an interaction between the QTL and the genetic background

Position of QTLs detected with (a) M1, (b) M2 and (c) M3 for MF using a FDR of 20%.

The size of the grey dots is proportional to the -log10(pval) of the test at the most significant SNP of the region. Red vertical lines correspond to the location of the QTLs presented in section “Highlighted QTLs”. Note that major QTLs detected by a model may be discarded with another model because of filtering on allele frequencies. First, a standard GWAS model M was applied separately to the dent and the flint datasets. Based on a 20% FDR, 35 SNPs were associated with MF in the dent dataset while 21 SNPs were associated in the flint dataset. These SNPs can be clustered into 12 QTLs in the dent dataset and into 13 QTLs in the flint dataset. Interestingly, none of these SNPs were detected in both datasets and they only pointed to one common QTL between datasets, which was located in the vicinity of Vgt2 on chromosome 8 [15]. Secondly, dent and flint datasets were analyzed jointly using model M, which takes into account the dent or flint ancestry of the allele. Note that the allele ancestry is confounded with the genetic background in this model. Based on a 20% FDR, 10 SNPs were associated with MF and were significant for (5 SNPs), (4 SNPs) and (3 SNPs). Some SNPs displayed more than one significant test, which explains why the total number of SNPs over the four tests did not sum to 10. These SNPs can be clustered into 5 QTLs that were significant for (4 QTLs), (2 QTLs) and (2 QTLs). Some QTLs were already detected using M such as the QTL located in the vicinity of Vgt3 on chromosome 3 [54, 55] detected in the dent dataset. Other QTLs were specific to M like the QTL located chromosome 1 detected using for FF, or specific to M such as the QTL located on chromosome 2 detected in the flint dataset. Based on a 20% FDR, a larger number of QTLs was detected with M compared to M for both traits. Finally, the dent, flint and admixed lines were analyzed jointly using model M which distinguished the allele ancestry and the genetic background. The existence of a dent SNP effect was tested in the dent () and in the admixed genetic backgrounds (), and similarly for the flint SNP effect ( and ). Several hypotheses on general and divergent SNP effects were also tested between ancestries and genetic backgrounds (Table 3). Based on a 20% FDR, 56 SNPs were associated with MF and were significant for (19 SNPs), (2 SNPs), (4 SNPs) and others. These SNPs can be clustered into 17 QTLs that were significant for (5 QTLs), (2 QTLs), (4 QTLs) and others. Some of the QTLs were already detected using M and M such as the QTL located in the vicinity of Vgt3 on chromosome 3, while several QTLs were specific to M such as the QTL detected in chromosome 2 using . Several QTLs were detected as showing a divergent SNP effect, including hypotheses testing an interaction with the genetic background. Based on a 20% FDR, a similar number of QTLs was detected using M and M for MF and M was intermediate between M and M for FF.

Highlighted QTLs

Among the 17 QTLs detected for MF with M, six QTLs were selected and studied in further details. These QTLs had (i) at least one significant test among M hypotheses based on a FDR of 20%, and (ii) a large frequency for each allele with a minimum of 23 lines carrying the minor allelic state (Vgt1). Among them, SNPs were located in the vicinity of known maize flowering QTLs: Vgt1 [22, 47, 48], Vgt2 [15] and Vgt3 [54, 55]. For all QTLs, information concerning their physical position along the genome, the frequency of each allelic state and their -log10(pval) at each test was summarized in Table 5. The distribution of the phenotypes is illustrated for each allele after adjusting for the variation due to the polygenic background in Fig 5, and their location along the genome is indicated by red vertical lines in Fig 4.

Table 5

Information regarding the six highlighted QTLs.

	Vgt1	Vgt2	Vgt3	QTL4.1	QTL2.1	QTL7.2
Trait	MF	MF	MF	MF	MF	MF
SNP	AX-91103145	AX-91100620	AX-91583310	AX-91218190	AX-90601996	AX-91744673
Chromosome	8	8	3	4	2	7
Position (Mbp)	132.53	123.50	158.97	31.10	7.04	173.73
Allele frequency
0DD	242	230	97	115	75	243
1DD	58	70	203	185	225	57
0DA	138	119	48	53	50	161
1DA	41	58	141	127	134	30
0FA	164	81	92	107	74	113
1FA	23	108	85	79	108	62
0FF	238	162	158	161	102	210
1FF	66	142	146	143	202	94
-log₁₀(pval)
M₁
Δ^m (Dent)	1.85	4.26*	10.99***	4.96*	0.05	1.00
Δ^m (Flint)	2.36 .	2.74 .	0.88	0.31	1.24	1.20
M₂
ΔDm	2.03 .	4.19 *	9.42 ***	3.51 *	0.02	0.98
ΔFm	2.15 .	2.55	1.20	2.42 .	1.36	0.91
ΔD+Fm	0.00	6.04 **	7.81 ***	0.49	0.68	0.11
ΔD−Fm	3.83 *	0.54	3.20 *	5.54 **	0.77	1.60
M₃
ΔDDm	2.44 .	4.06 *	8.69 ***	3.81 *	0.07	1.53
ΔDAm	3.66 *	3.23 *	1.63	1.30	0.43	3.14 *
ΔFAm	1.31	1.30	2.29 .	0.97	8.99 ***	0.18
ΔFFm	1.78	2.96 .	0.92	3.18 *	1.31	1.41
ΔDD+FFm	0.21	6.30 **	7.11 ***	0.40	0.77	0.18
ΔDD+DAm	4.39 *	5.23 **	6.09 **	3.30 *	0.19	0.41
ΔFF+FAm	2.09 .	2.63 .	2.10 .	2.49 .	6.42 **	0.43
ΔDA+FAm	0.47	3.65 *	3.32 *	0.19	3.07 *	1.72
ΔDD+DA+FA+FFm	0.44	6.68 **	6.81 **	0.36	2.53 .	0.66
ΔDD−FFm	3.90 *	0.47	3.47 *	6.59 **	0.59	2.56 .
ΔDD−DAm ^b	0.33	0.19	2.28 .	0.67	0.43	5.43 **
ΔFF−FAm ^b	0.02	0.52	0.77	0.60	3.98 *	1.51
ΔDA−FAm	4.13 *	0.71	0.06	1.94	5.44 **	2.41 .
Δ(DD+DA)−(FA+FF)m	5.96 **	0.77	1.29	5.38 **	3.64 *	0.03
Δ(DD+FF)−(DA+FA)m ^b	0.19	0.49	0.68	0.10	1.08	1.83
Δ(DD−DA)−(FF−FA)m ^b	0.23	0.11	2.56 .	1.03	2.78 .	6.20 **

***: -log10(pval) > 7;

**: 7 > -log10(pval) > 5;

*: 5 > -log10(pval) > 3;

.: 3 > -log10(pval) > 2

b hypothesis testing an interaction between the QTL and the genetic background

Fig 5

Boxplots of phenotypes adjusted for polygenic background variation using relatedness (MF K corrected) for the different alleles of the six highlighted QTLs: (a) Vgt1, (b) Vgt2, (c) Vgt3, (d) QTL4.1, (e) QTL2.1 and (f) QTL7.2 using M3.

The denomination of the allelic states on the x-axis includes the SNP allele (0/1), its ancestry (D/F) and the genetic background in which it was observed (D/A/F), as presented in Table 2.

Boxplots of phenotypes adjusted for polygenic background variation using relatedness (MF K corrected) for the different alleles of the six highlighted QTLs: (a) Vgt1, (b) Vgt2, (c) Vgt3, (d) QTL4.1, (e) QTL2.1 and (f) QTL7.2 using M3.

The denomination of the allelic states on the x-axis includes the SNP allele (0/1), its ancestry (D/F) and the genetic background in which it was observed (D/A/F), as presented in Table 2.

Information regarding the six highlighted QTLs.

The -log10(pval) of M and M were obtained by training the complete GWAS models with all the genetic components presented in Eq (1) on the six SNPs that were previously detected using the approximate model. ***: -log10(pval) > 7; **: 7 > -log10(pval) > 5; *: 5 > -log10(pval) > 3; .: 3 > -log10(pval) > 2 b hypothesis testing an interaction between the QTL and the genetic background The SNP matching Vgt1 region on chromosome 8 was detected as associated with MF (20% FDR) using (-log10(pval) = 5.96) in M. This QTL showed a contrasted effect between alleles of different ancestries with an apparent inversion of effects (Fig 5a). This observation was supported by a high -log10(pval) for the tests related to a divergent SNP effect between ancestries: (3.83), (3.90), (4.13) and (5.96). Conversely a low -log10(pval) was detected for tests and , which would have otherwise suggested an interaction with the genetic background. These results support the existence of a local genomic difference at Vgt1 between the dent and the flint genetic groups for MF, but no interaction with the genetic background. The SNP matching Vgt2 region on chromosome 8 was detected as associated with MF (20% FDR) using (-log10(pval) = 6.68) in M. This QTL showed a conserved effect across ancestries and genetic backgrounds (Fig 5b). This observation was supported by a high -log10(pval) for tests related to a general SNP effect: (6.04), (6.30), (5.23), (3.65) and (6.68), and a low -log10(pval) for tests related to divergent SNP effects (all below 1). The SNP matching Vgt3 region on chromosome 3 was detected as associated with MF (5% FDR) using (-log10(pval) = 8.69) in M. This QTL showed a large effect in the dent genetic background, a medium effect in the admixed genetic background regardless of the allele ancestry and a small effect in the flint genetic background (Fig 5c). This observation was supported by a high -log10(pval) for the tests related to the dent SNP effect in the dent genetic background: Δ (M (Dent), 10.99), (9.42) and (8.69), and a low -log10(pval) for the tests related to the flint SNP effect in a flint genetic background. Like for Vgt2, a high -log10(pval) was also detected for tests related to a general SNP effect: (7.81), (7.11), (6.09) and (6.81), but a high -log10(pval) was detected for the test related to a divergent SNP effect between the dent and the flint genetic backgrounds: (3.47). There was also a high -log10(pval) for a divergent dent SNP effect between different genetic backgrounds: (2.28). All these results support the existence of a QTL effect that tends to be higher when the dent genome proportion increases within individuals. It suggests that Vgt3 interacts with the genetic background for MF. The SNP matching a region further referred to as QTL4.1 on chromosome 4 was detected as associated with MF (20% FDR) using (-log10(pval) = 6.59) in M. This QTL is very similar to Vgt1 as it showed a contrasted effect between alleles of different ancestries with an apparent inversion of effects (Fig 5d). This observation was supported by a high -log10(pval) for the tests related to a divergent SNP effect between ancestries: (5.54), (6.59) and (5.38). These results support the existence of a local genomic difference at QTL4.1 between the dent and the flint genetic groups for MF, but no interaction with the genetic background. The SNP matching a region further referred to as QTL2.1 on chromosome 2 was detected as associated with MF (5% FDR) using (-log10(pval) = 8.99) in M. This QTL showed a flint effect in the admixed genetic background (Fig 5e), which was supported by a high -log10(pval) for the test (8.99). Although there was a high -log10(pval) for a general flint SNP effect across genetic backgrounds: (6.42), a high -log10(pval) was observed for a divergent SNP effect between those same alleles: (3.98). A high -log10(pval) was also observed for a divergent SNP effect between different ancestries in the admixed genetic background: (5.44). All these results support the existence of a QTL effect existing only for alleles of flint ancestry in the admixed genetic background. It suggests that QTL2.1 is specific of flint ancestry and interacts with the genetic background for MF. The SNP matching a region further referred to as QTL7.2 on chromsome 7 was detected as associated with MF (20% FDR) using (-log10(pval) = 6.20) in M. This QTL showed contrasted dent effects between the dent and the admixed genetic backgrounds (Fig 5f). This observation was supported by a high -log10(pval) for the test related to a divergent dent SNP effect between genetic backgrounds: (5.43). A high -log10(pval) was also observed for the hypothesis testing the equality between the divergent dent SNP effect and the divergent flint SNP effect: (6.20). All these results support the existence of a QTL with opposite effects between the dent and the admixed genetic backgrounds. It suggests that QTL7.2 interacts with the genetic background for MF.

Discussion

Accounting for genetic groups in GWAS

The stratification of the population sample into distinct genetic groups is a common feature in GWAS studies that challenges the methods to detect QTLs. A simple way to deal with genetic groups is to analyze them separately. In our study, a standard GWAS model M was applied separately to the dent and the flint datasets. Among the QTLs detected for MF, only one was detected in both dent and flint datasets, and not at the same SNPs, while none were detected in common for FF. One may question whether observing such differences between datasets indicated group specific allele effects, or simply group differences in terms of statistical power due to a difference in allele frequency. This question often arises when GWAS is applied separately to genetic groups, as in maize [16, 56] or dairy cattle [57, 58], and is very difficult to answer except for obvious configurations such as associations at SNPs segregating only in one group. Another way to handle genetic groups is to analyze them jointly. One possibility is to apply model M while specifying genetic structure as a global fixed effect, in order to prevent the detection of spurious associations. In dairy cattle, this strategy generally improved the precision concerning QTL locations by taking advantage of the low LD extent observed in multi-group datasets. However, while [34] and [33] observed a gain in statistical power due to a larger population size, [32] detected less QTLs by combining breeds compared to separate analyses. They attributed this finding to the limited amount of QTLs segregating within both Holstein and Jersey breeds, but also reported that QTLs detected in both breeds showed only small to medium correlations between within-breed estimates of SNP effects (e.g. 0.082 for milk yield). Obviously, applying M jointly to genetic groups does not address directly the problem of whether QTL effects are conserved or not between genetic groups. A model specifying group specific allele effects was referred to as M in this study. As with M, the existence of a SNP effect can be tested for each group, but M also allows one to test the existence of a general and a divergent SNP effects between groups. In our study, this model allowed to test for a dent () and a flint () SNP effect, along with a general () and a divergent () SNP effects between flint and dent ancestries. Note that testing is similar, although not strictly equivalent, to testing a SNP effect by applying M to a multi-group dataset. Using in M, the same weights are given to allelic contrasts in the two groups. Applying M to a multi-group dataset would only be equivalent to applying M when considering markers with identical allele frequencies in the two groups. Using the hypotheses specifically tested in M ( and ), it was possible to detect new QTLs that were not detected with M. In particular, a QTL detected on chromosome 1 for FF had a divergent SNP effect between the dent and flint genetic groups, suggesting the existence of group-specific QTL effects in this dataset. Some QTLs were detected in common with M but each strategy allowed the detection of specific QTLs, demonstrating the complementarity between the models. In conclusion, M was efficient to identify QTLs with either conserved or specific allele effects between ancestries, but observing group-specific allele effects provided little insight regarding the cause of this specificity. Admixed individuals helped to tackle this issue.

Benefits from admixed individuals

Admixed individuals were generated for this study by mating pure individuals of each group according to a sparse factorial design. Integrating these admixed individuals in GWAS can be done by simply analyzing the joint multi-group dataset using M or M, which may lead to a gain in statistical power, due to an increase in population size. More interestingly, admixed individuals can be used to disentangle the factors causing the heterogeneity of allele effects across groups. We developed model M to distinguish the allele ancestry (dent/flint) and the genetic background (dent/flint/admixed). As shown using simulations (S2 Appendix), applying M should result in a gain in statistical power by (i) testing an overall SNP effect for SNP with conserved effects accross ancestries and/or genetic backgrounds, and (ii) testing hypotheses for complex configurations between allele effects. When applied to MF, 17 QTLs were detected (20% FDR). While many of these QTLs were previously detected using M and M, the new hypotheses tested allowed us to discover new interesting regions. For equivalent tests in M, M and M (e.g. Δ (Dent) in M, in M and in M), the lower number of associations detected with M and M compared to M for real traits can be attributed to a different filtering on allele frequencies, the use of an approximate model for M and M, and to the randomness associated with a particular experiment. Regarding false positive control, the observation of the QQ-plots of the test p-values of M, M and M did not show particular problems, as presented for MF in S8, S9, and S10 Figs and for FF in S11, S12 and S13 Figs. The idea of exploiting admixed individuals has been proposed in the creation of NAM [40] and MAGIC [41] populations. Compared to our approach, such experimental populations include a limited number of founders, generally selected in different genetic groups. This is beneficial to increase power of detection for alleles which were rare in parental groups. However these populations cannot address the question of the epistatic interaction with the genetic background of the original groups. Both our approach and NAM and MAGIC designs are therefore expected to have complementary properties.

Heterogeneity of maize flowering QTL allele effects

From a global perspective, a high number of QTLs have been detected in previous maize studies [16, 22, 37, 59, 60]. When evaluating the American and European NAMs, [22] and [61] showed that flowering time is a trait controlled by a large number of QTLs, many of which display variable effects across individual recombinant populations. Our study highlights consistently a high number of QTLs and confirms a large variation in allele effects. It provides further elements on the origin of this variation, by identifying QTLs affected by local genomic differences, epistasis with the genetic background, or both. When doing GWAS in a multi-group population, geneticists generally assume that QTL effects are conserved between groups. Such QTLs were detected in our study with the example of the SNP associated with MF in the vicinity of Vgt2 [15] and its candidate gene: the flowering activator ZCN8 [62-64] on chromosome 8. At this SNP, all hypotheses that tested a general SNP effect had a high -log10(pval), and conversely for hypotheses testing a divergent SNP effect. When simultaneously interpreting all tests, Vgt2 appeared to have an effect that is conserved between genetic groups. Such a QTL can easily be detected in a multi-group population sample using a standard GWAS model [1]. However many QTLs showed more complex patterns. When group-specific allele effects are only due to group differences in LD or group-specific mutations at the QTL, the difference in allele effects should be conserved between the pure and the admixed genetic backgrounds. A first QTL matching this situation is Vgt1 [22, 47, 48] (candidate gene: ZmRap2.7) that was detected by a SNP located on chromosome 8. High -log10(pval) were observed when testing for a divergent SNP effect between ancestries (), suggesting a local genomic difference. It remains difficult to disentangle the effect of LD from that of a genetic mutation without complementary analysis. LD was shown to be different between groups, with a higher LD extent in the dent group (S5 Fig), while LD phases appeared well-conserved at short distances (S6 Fig). However, a strong overall conservation of LD phases at short distances does not exclude a specific configuration for a given SNP-QTL pair. Note that Vgt1 was surprisingly not detected using the MITE located 548 Kbp before the detected SNP. [48] already showed the existence of other genetic variants being more associated with maize flowering than the MITE in the vicinity of Vgt1, such as CGindel587. Another QTL (QTL4.1) was detected by a SNP located on chromosome 4 and had a very similar profile to that of Vgt1. Its position is close (< 700 Kbp) to GRMZM2G126253, a candidate gene for maize flowering time proposed by [60]. To validate the hypothesis of a local genomic difference at these QTLs, one could produce near isogenic lines with the two alleles from both ancestries introgressed in a dent and a flint genetic backgrounds. A phenotypic evaluation of these individuals would give a definitive proof of a local genomic difference. Group-specific allele effects may also be due to an interaction with the genetic background. A first QTL matching this profile was detected by a SNP in the vicinity of Vgt3 on chromosome 3 [54, 55] and its candidate gene ZmMADS69 [65]. This QTL showed an effect varying according to the genetic background: large in the dent, intermediate in the admixed and small in the flint. A high -log10(pval) was observed for tests that supported this hypothesis: a dent SNP effect in the dent genetic background () and a divergent dent SNP effect between genetic backgrounds (). If this interaction with the background involves numerous loci, introgressing alleles from a dent into a flint genetic background may lead to disappointing results, as the effect would probably vanish with repeated back-cross generations. If interactions mostly involve a single locus, the effect at Vgt3 effect is conditioned by the allele at the other locus, so that a simultaneous introgression may be necessary to reach the desired effect. Using near isogenic lines that cumulated an early mutation at Vgt1 [66] and the early allele at Vgt3, the effect of Vgt3 was shown to vanish in presence of the early allele of Vgt1 (A. Charcosset pers. comm.), which supports the hypothesis of Vgt3 interacting with the genetic background. Recently, [65] demonstrated the action of ZmMADS69, the candidate gene of Vgt3, as being an activator of the regulatory module ZmRap2.7—ZCN8, which are the candidate genes of Vgt1 and Vgt2, respectively. The existence of such interactions is consistent with flowering time being controlled by a network of interacting loci, as now well established in model species arabidopis [67]. Other examples of QTLs interacting with the genetic background were identified. Two of them featured a similar profile in the sense that they mainly exhibited a QTL effect in the admixed genetic background. One was located on chromosome 2 (QTL2.1) and showed a flint effect in the admixed genetic background, while the other QTL was located on chromosome 7 (QTL7.2) and showed an opposite dent effect between the dent and the admixed genetic backgrounds. Such QTLs are interesting as they are mainly revealed when creating admixed genetic material. They also suggest complex epistatic interactions between QTLs for these traits. The position of QTL2.1 is close (< 1.4 Mbp) to ereb197 and the position of QTL7.2 is close (< 100 Kbp) to dof47. Both are candidate genes for maize flowering time proposed by [60]. The existence of epistatic interactions was also evaluated globally by decomposing the genetic variance into an additive and an epistatic component, as suggested by [68]. This confirmed the existence of epistatic interactions between pairs of loci for FF and MF (S5 Table) and supported the possibility of QTLs interacting with the genetic background, resulting from epistatic interactions with loci that have differentiated allele frequencies between groups. It would be interesting to test the existence of epistatic interactions between each pair of loci. However, a filtering on crossed allele frequencies between pairs of loci would lead to discard most SNPs from the analysis. Other possibilities would be to test the epistatic variance of each SNP against the polygenic background, as proposed by [69-71].

Conclusion

In this study, we proposed an innovative multi-group GWAS method which accounts and tests for the heterogeneity of QTL allele effects between groups. The addition of admixed individuals to the dataset was useful to disentangle the factors causing the heterogeneity of allele effects, being either local genomic differences or epistatic interactions with the genetic background. Only homozygous inbred lines were considered in this study, but the method may be generalized to heterozygous individuals. Recently many studies focused on the problem of genomic prediction across genetic groups [42, 72–75]. In such scenarios, the stability of QTL effects across genetic backgrounds is an important factor impacting the prediction accuracy. It is also an important factor of the relevancy of any marker based diagnostic in complex/structured populations. Our approach opens new perspectives to investigate this stability in a wide range of species.

Imputation diagram of admixed lines.

Diagram illustrating the procedure applied to impute admixed DH lines from 15K to 600K SNPs using the parental origin of alleles. (TIF) Click here for additional data file.

Histogram of dent genome proportion among admixed lines.

(TIF) Click here for additional data file.

Genome-wide selection biases among admixed lines.

Absolute difference between observed allele frequency of the reference allele f estimated on the admixed lines and their expected value f along each chromosome (|f − f|). The expected allele frequencies were computed as the mean of flint and dent allele frequencies estimated on the parental lines by taking into account the contribution of each parent. A cubic smoothing spline was adjusted using the R function “smooth.spline”, and plotted in red. (TIF) Click here for additional data file.

PCoA on genetic distances using the set of 9,015 shared SNPs between the 600K and 15K datasets.

Individuals were colored depending on their genetic background: dent, flint or admixed. (TIF) Click here for additional data file.

LD extent.

LD extent estimated separately in dent and flint genetic groups using the standard r2. LD was calculated and averaged for loci pairs characterized by a similar physical distance ranging from 0 to 2 Mbp, considering a sliding window of 1Kbp. A cubic smooth spline was adjusted for each group, using the R function “smooth.spline”. (TIF) Click here for additional data file.

Conservation of LD phases.

Conservation of LD phases estimated using the correlation (a) between the r of dent and flint groups, and (b) between the signs of r in the dent and flint groups. LD was calculated and averaged for loci pairs characterized by a similar physical distance ranging from 0 to 2 Mbp, considering a sliding window of 1Kbp. A cubic smooth spline was adjusted for each method, using the R function “smooth.spline”. (TIF) Click here for additional data file.

Position of QTLs detected for FF.

Position of QTLs detected for FF with a FDR of 20% using (a) M, (b) M and (c) M. The size of the grey dots is proportional to the -log10(pval) of the test at the most significant SNP of the region. (TIF) Click here for additional data file.

QQ-plots of M1 for MF.

(TIF) Click here for additional data file.

QQ-plots of M2 for MF.

(TIF) Click here for additional data file.

QQ-plots of M3 for MF.

(TIF) Click here for additional data file.

QQ-plots of M1 for FF.

(TIF) Click here for additional data file.

QQ-plots of M2 for FF.

(TIF) Click here for additional data file.

QQ-plots of M3 for FF.

(TIF) Click here for additional data file.

Parameters estimated in the phenotypic analysis.

The lines “Row-Column” refer to the modeling of rows and columns as defined by the experimental design. AR1 refers to the autoregressive model AR1, while IID refers to the modeling of rows and columns as being independent and identically distributed among rows and among columns for a given trial. For more information, see the ASReml-R reference manual by [49]. The mean of each trial j (with j ∈ {2015, 2016}) was computed following: where N is the number of individuals (genotypes) in genetic background k (with k ∈ {D, A, F}) and N is the total number of individuals. The mean of each genetic background was computed following: . The genetic variance of each genetic background k and the GxE variance of each genetic background k in each trial j were also reported. The heritabilities of each genetic background k were computed as: where is the mean number of genotype replicates in trial j. (XLSX) Click here for additional data file.

Parameters estimated using the general polygenic model.

The parameters included the mean μ and, the genetic variance of each genetic background, the genetic covariance between genetic background k and k′, and the error variance , with k ∈ {D, A, F}. The genetic correlations r between genetic backgrounds were also reported, with . (XLSX) Click here for additional data file.

Information regarding significant SNPs for MF.

Information regarding significant SNPs for MF using all GWAS strategies: the name of the SNP, the chromosome on which it is located, its position in bp along the chromosome, the frequency of the allelic state observed in the dataset in which it was tested, the GWAS model applied, the hypothesis tested, the estimated values of the contrast (Delta), the Wald statistics and the -log10(pval) of the test (obtained from the approximate model for M and M), and the FDR for which it was declared significant. (XLSX) Click here for additional data file.

Information regarding significant SNPs for FF.

Information regarding significant SNPs for FF using all GWAS strategies: the name of the SNP, the chromosome on which it is located, its position in bp along the chromosome, the frequency of the allelic state observed in the dataset in which it was tested, the GWAS model applied, the hypothesis tested, the estimated values of the contrast (Delta), the Wald statistics and the -log10(pval) of the test (obtained from the approximate model for M and M), and the FDR for which it was declared significant. (XLSX) Click here for additional data file.

Additive, epistatic and residual variance components for each trait with the p-value (pval) of the epistatic component using a likelihood-ratio LR test.

The existence of epistasis can be investigated using a test based on variance components. The epistatic variance component between pairs of loci was estimated on the joint dent, flint and admixed dataset using a model neglecting genetic structure: = 1μ + + + , where is the vector of phenotypes, 1 is a vector of 1, μ is the global intercept, is the vector of additive genetic values with , is the kinship matrix computed following Eq (2) and assuming a common genetic background for all individuals, i.e. using the average frequency of allele 1 at each locus, is the global genetic variance, is the vector of global epistatic deviations with , is the epistatic genetic variance between pairs of loci, is the vector of errors with , is the identity matrix, is the error variance. Note that ∘ is the Hadamard product of the kinship matrix with itself. This model can be seen as a simplified version of the one proposed by [68], as purely homozygous lines were used. The epistatic variance component was tested using a LR test between this model and the same model without the term . (XLSX) Click here for additional data file.

Interpretation of the test .

This appendix shows that tests for an epistatic interaction between the SNP and the genetic background. (PDF) Click here for additional data file.

False discovery rate and statistical power of GWAS models.

In this appendix, the properties of the new GWAS models were evaluated in terms of false discovery rate and statistical power of the tests. (PDF) Click here for additional data file.

68 in total

1. Flowering time in maize: linkage and epistasis at a major effect locus.

Authors: Eléonore Durand; Sophie Bouchet; Pascal Bertin; Adrienne Ressayre; Philippe Jamin; Alain Charcosset; Christine Dillmann; Maud I Tenaillon
Journal: Genetics Date: 2012-01-31 Impact factor: 4.562

2. Genetic architecture of flowering time in maize as inferred from quantitative trait loci meta-analysis and synteny conservation with the rice genome.

Authors: Fabien Chardon; Bérangère Virlon; Laurence Moreau; Matthieu Falque; Johann Joets; Laurent Decousset; Alain Murigneux; Alain Charcosset
Journal: Genetics Date: 2004-12 Impact factor: 4.562

3. Extent and genome-wide distribution of linkage disequilibrium in commercial maize germplasm.

Authors: Delphine Van Inghelandt; Jochen C Reif; Baldev S Dhillon; Pascal Flament; Albrecht E Melchinger
Journal: Theor Appl Genet Date: 2011-03-15 Impact factor: 5.699

Review 4. Meta-analysis methods for genome-wide association studies and beyond.

Authors: Evangelos Evangelou; John P A Ioannidis
Journal: Nat Rev Genet Date: 2013-05-09 Impact factor: 53.242

5. Recovering power in association mapping panels with variable levels of linkage disequilibrium.

Authors: Renaud Rincent; Laurence Moreau; Hervé Monod; Estelle Kuhn; Albrecht E Melchinger; Rosa A Malvar; Jesus Moreno-Gonzalez; Stéphane Nicolas; Delphine Madur; Valérie Combes; Fabrice Dumas; Thomas Altmann; Dominique Brunel; Milena Ouzunova; Pascal Flament; Pierre Dubreuil; Alain Charcosset; Tristan Mary-Huard
Journal: Genetics Date: 2014-02-14 Impact factor: 4.562

6. How population growth affects linkage disequilibrium.

Authors: Alan R Rogers
Journal: Genetics Date: 2014-06-06 Impact factor: 4.562

Review 7. Linkage disequilibrium in humans: models and data.

Authors: J K Pritchard; M Przeworski
Journal: Am J Hum Genet Date: 2001-06-14 Impact factor: 11.025

8. Stepwise cis-Regulatory Changes in ZCN8 Contribute to Maize Flowering-Time Adaptation.

Authors: Li Guo; Xuehan Wang; Min Zhao; Cheng Huang; Cong Li; Dan Li; Chin Jian Yang; Alessandra M York; Wei Xue; Guanghui Xu; Yameng Liang; Qiuyue Chen; John F Doebley; Feng Tian
Journal: Curr Biol Date: 2018-09-13 Impact factor: 10.834

9. Estimation of genetic parameters and detection of quantitative trait loci for minerals in Danish Holstein and Danish Jersey milk.

Authors: Bart Buitenhuis; Nina A Poulsen; Lotte B Larsen; Jakob Sehested
Journal: BMC Genet Date: 2015-05-21 Impact factor: 2.797

10. Genomic selection efficiency and a priori estimation of accuracy in a structured dent maize panel.

Authors: Simon Rio; Tristan Mary-Huard; Laurence Moreau; Alain Charcosset
Journal: Theor Appl Genet Date: 2018-10-04 Impact factor: 5.699

9 in total

1. Accounting for Group-Specific Allele Effects and Admixture in Genomic Predictions: Theory and Experimental Evaluation in Maize.

Authors: Simon Rio; Laurence Moreau; Alain Charcosset; Tristan Mary-Huard
Journal: Genetics Date: 2020-07-17 Impact factor: 4.562

2. Usefulness of temperate-adapted maize lines developed by doubled haploid and single-seed descent methods.

Authors: Iara Gonçalves Dos Santos; Anderson Luiz Verzegnazzi; Jode Edwards; Ursula K Frei; Nicholas Boerman; Leandro Tonello Zuffo; Luiz P M Pires; Gerald de La Fuente; Thomas Lübberstedt
Journal: Theor Appl Genet Date: 2022-03-19 Impact factor: 5.699

3. Across-population genomic prediction in grapevine opens up promising prospects for breeding.

Authors: Charlotte Brault; Vincent Segura; Patrice This; Loïc Le Cunff; Timothée Flutre; Pierre François; Thierry Pons; Jean-Pierre Péros; Agnès Doligez
Journal: Hortic Res Date: 2022-02-19 Impact factor: 7.291

4. Improving genomic predictions with inbreeding and nonadditive effects in two admixed maize hybrid populations in single and multienvironment contexts.

Authors: Morgane Roth; Aurélien Beugnot; Tristan Mary-Huard; Laurence Moreau; Alain Charcosset; Julie B Fiévet
Journal: Genetics Date: 2022-04-04 Impact factor: 4.402

5. Whole-genome sequencing of African Americans implicates differential genetic architecture in inflammatory bowel disease.

Authors: Hari K Somineni; Sini Nagpal; Suresh Venkateswaran; David J Cutler; David T Okou; Talin Haritunians; Claire L Simpson; Ferdouse Begum; Lisa W Datta; Antonio J Quiros; Jenifer Seminerio; Emebet Mengesha; Jonathan S Alexander; Robert N Baldassano; Sharon Dudley-Brown; Raymond K Cross; Themistocles Dassopoulos; Lee A Denson; Tanvi A Dhere; Heba Iskandar; Gerald W Dryden; Jason K Hou; Sunny Z Hussain; Jeffrey S Hyams; Kim L Isaacs; Howard Kader; Michael D Kappelman; Jeffry Katz; Richard Kellermayer; John F Kuemmerle; Mark Lazarev; Ellen Li; Peter Mannon; Dedrick E Moulton; Rodney D Newberry; Ashish S Patel; Joel Pekow; Shehzad A Saeed; John F Valentine; Ming-Hsi Wang; Jacob L McCauley; Maria T Abreu; Traci Jester; Zarela Molle-Rios; Sirish Palle; Ellen J Scherl; John Kwon; John D Rioux; Richard H Duerr; Mark S Silverberg; Michael E Zwick; Christine Stevens; Mark J Daly; Judy H Cho; Greg Gibson; Dermot P B McGovern; Steven R Brant; Subra Kugathasan
Journal: Am J Hum Genet Date: 2021-02-17 Impact factor: 11.025

6. Interpreting coronary artery disease GWAS results: A functional genomics approach assessing biological significance.

Authors: Katherine Hartmann; Michał Seweryn; Wolfgang Sadee
Journal: PLoS One Date: 2022-02-22 Impact factor: 3.240

Review 7. Metabolomics for Crop Breeding: General Considerations.

Authors: Dmitry Y Litvinov; Gennady I Karlov; Mikhail G Divashuk
Journal: Genes (Basel) Date: 2021-10-12 Impact factor: 4.096

8. Perspectives on Applications of Hierarchical Gene-To-Phenotype (G2P) Maps to Capture Non-stationary Effects of Alleles in Genomic Prediction.

Authors: Owen M Powell; Kai P Voss-Fels; David R Jordan; Graeme Hammer; Mark Cooper
Journal: Front Plant Sci Date: 2021-06-04 Impact factor: 5.753

9. Modeling allelic diversity of multiparent mapping populations affects detection of quantitative trait loci.

Authors: Sarah G Odell; Asher I Hudson; Sébastien Praud; Pierre Dubreuil; Marie-Hélène Tixier; Jeffrey Ross-Ibarra; Daniel E Runcie
Journal: G3 (Bethesda) Date: 2022-03-04 Impact factor: 3.542

9 in total