Literature DB >> 35275920

Estimating genetic variance contributed by a quantitative trait locus: A random model approach.

Shibo Wang¹, Fangjie Xie¹, Shizhong Xu¹.

Abstract

Detecting quantitative trait loci (QTL) and estimating QTL variances (represented by the squared QTL effects) are two main goals of QTL mapping and genome-wide association studies (GWAS). However, there are issues associated with estimated QTL variances and such issues have not attracted much attention from the QTL mapping community. Estimated QTL variances are usually biased upwards due to estimation being associated with significance tests. The phenomenon is called the Beavis effect. However, estimated variances of QTL without significance tests can also be biased upwards, which cannot be explained by the Beavis effect; rather, this bias is due to the fact that QTL variances are often estimated as the squares of the estimated QTL effects. The parameters are the QTL effects and the estimated QTL variances are obtained by squaring the estimated QTL effects. This square transformation failed to incorporate the errors of estimated QTL effects into the transformation. The consequence is biases in estimated QTL variances. To correct the biases, we can either reformulate the QTL model by treating the QTL effect as random and directly estimate the QTL variance (as a variance component) or adjust the bias by taking into account the error of the estimated QTL effect. A moment method of estimation has been proposed to correct the bias. The method has been validated via Monte Carlo simulation studies. The method has been applied to QTL mapping for the 10-week-body-weight trait from an F2 mouse population.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35275920 PMCID： PMC8942241 DOI： 10.1371/journal.pcbi.1009923

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

This is a PLOS Computational Biology Methods paper.

1 Introduction

Quantitative trait locus (QTL) mapping [1] and genome-wide association studies (GWAS) [2] are the main tools for identifying genomic regions harboring quantitative trait loci. These QTL regions are the targets for molecular geneticists to further expand the experiments, to clone the actual genes for agronomic traits and to help breeders develop optimal marker assisted selection (MAS) programs [3]. Goring et al. [4] stated that the primary goal of QTL mapping and GWAS is to locate QTL and the secondary goal is to quantify the sizes of QTL. The size of a QTL is represented by the squared QTL effect or the QTL variance. We believe that estimating the variances of QTL is equally important as locating the QTL because only QTL with large effects are useful for application while small effect but statistically significant QTL are not economically meaningful. Statistical significance is primarily determined by the sample size. A small-effect QTL can be detected in a very large sample, but such a small-effect QTL is useless in any breeding programs. The final reported variance of a detected QTL is often converted into the proportion of phenotypic variance contributed by the QTL, called the QTL heritability [5,6]. In addition, whether a QTL is large or small is determined relative to the residual or phenotypic variance. In interval mapping [1], composite interval mapping [7,8] and genome-wide association studies [2], the effect of a QTL appears as a regression coefficient in a linear model or a linear mixed model. The regression coefficient is a parameter in the model. The least squares or maximum likelihood estimate of a QTL effect is often unbiased [1,9]. However, when the unbiased estimate of the QTL effect is converted into a squared QTL effect, i.e., QTL variance, the estimated variance is no longer unbiased. The bias can be substantially high for small-effect QTL detected from small samples [6]. This bias is not related to the Beavis effect, which is primarily caused by significance tests [10-12]. Biased estimates of QTL variances discussed in previous literature is almost all related to significance tests, i.e., the Beavis effect. However, the bias can occur even if there is no significance test associated with the estimation and this bias has been virtually ignored in the QTL mapping community [6]. There was little theoretical explanation for the bias. Broman [13] and Allison et al. [14] also noted that estimates of non-significant QTL effects may also be biased, primarily due to the constraints of QTL parameters. For example, if QTL heritability is the parameter, its solution space must be constrained between 0 and 1. If the true QTL heritability is close to 0 or close to 1, then the estimated QTL heritability will be biased towards the middle of the constrained interval. Beavis effect is a phenomenon that reported QTL from relatively small samples are often larger than they actually are [4,10-12]. The current study is focused on bias in estimated QTL variances not due to the Beavis effect but due to a wrong statistical model being used. The current models for QTL mapping and genome-wide association studies are linear models and linear mixed models [2,8,15,16]. The effect of a QTL appears in these models as a parameter that is subject to estimation. The QTL variance is often defined as the squared QTL effect and the estimated QTL variance is simply obtained by squaring the estimated QTL effect [5,6]. In general, the QTL variance is determined by the QTL effect and the frequency of the QTL alleles in the target population [17]. For example, in an F2 design of QTL mapping, suppose that we code the genotype indicator variable as for A1A1, Z = 0 for A1A2 and for A2A2 [18]. Let α be the average effect of gene substitution, i.e., the QTL effect [19]. In the absence of dominance and no segregation distortion, the QTL variance is defined as , where . In classical quantitative genetics [19,20], the effect of a quantitative trait locus is treated as a fixed effect but the genotype indicator variable is treated as random. The QTL variance is defined as , where p is the frequency of the “high” allele and q = 1–p is the frequency of the “low” allele and is the variance of the genotype indicator variable under the assumption of Hardy-Weinberg equilibrium. The genotype indicator variable Z here is coded as the number of “high” alleles in one of the three genotypes, i.e., Z = 2 for genotype A1A1, Z = 1 for genotype A1A2 and Z = 0 for genotype A2A2. The textbook [19] provides a model for the variance and no estimation of the variance is presented. When the dominance effect is absent (d = 0) or the two alleles have an equal frequency, the average effect of gene substitution is defined as α = a + d(q–p) = a, where is called the “additive effect.” The genotypic value (G11) is interpreted as the average trait value from all individuals with genotype A1A1 and G22 is the average trait value for all individuals with genotype A2A2. The genotypic values (G11 and G22) are not estimated parameters from a finite sample but the true genotypic values under the assumption of being obtained from an infinitely large sample. In classroom teaching, an instructor may use a finite sample to demonstrate how G is obtained, but the genotypic values are defined as the true values. A naïve estimate of the additive variance is , which is an over estimate of the additive variance. The statistical models of QTL mapping in a designed experiment are fixed models because the QTL effect (α) is the parameter subject to estimation and no distribution is assigned to this fixed effect. We are not criticizing the fixed effect models in classical quantitative genetics; rather, we point out that the naïve estimate of the QTL variance (estimated effect squared) is biased. Gianola et al [21] first systematically investigated the properties of this QTL variance. They assigned a normal distribution to α, where the variance of that distribution is interpreted as a prior uncertainty in the Bayesian framework. Chen et al [22] also proposed to treat the QTL effect as random and the QTL variance as the parameter. In fact, Chen et al [22] investigated the problem from an empirical Bayes point of view so that the QTL variance is the parameter of the prior distribution of the QTL effect. More detailed analysis of QTL variance can be found in Gianola et al. [23]. If we treat the QTL variance as the parameter of interest and directly estimate the QTL variance, the bias will disappear. The first random model approach to QTL mapping was proposed by Fernando and Grossman [24] for pedigree data analysis followed by the random model interval mapping developed by Xu and Atchley [25] for sib family data analysis. The repeated F2 design of experiment [17,26] is an extension of the simple F2 design of experiment initiated by crossing a common parent with multiple independent inbred lines. Since multiple parents are involved, the effects of parental alleles are treated as random effects with mean zero and an unknown variance. This variance is the QTL variance, which can be estimated via the maximum likelihood method. The QTL variance is tested with a likelihood ratio test [17,26]. QTL variance estimated this way is asymptotically unbiased or with little bias in finite samples. More random model QTL mapping procedures were developed in a short span of a half dozen years towards the end of the 20th century [27,28]. When the QTL effect is treated as a random effect, the parameter is the QTL variance and thus no bias or little bias is expected for the estimated QTL variance. Therefore, treating the QTL effect as a random effect and estimating the variance of the random effect is an alternative way of estimating QTL variance. We call the models with QTL effects treated as random effects random models, although they can be mixed models, technically, because a random polygenic effect may be included in the models. Note that we are talking about the bias in an estimated QTL variance, regardless whether the QTL is statistically significant or not. The sib-pair regression analysis of QTL mapping [29-32] is a fixed model (not a random model), but the parameter itself (regression coefficient) is already the QTL variance and thus there is no bias associated with the QTL variance in sib-pair regression analysis. The bottom line is that if the parameter subject to estimation is already the QTL variance, no bias or little bias is expected other than the bias caused by the Beavis effect. Some genomic selection models have been adopted for multiple locus GWAS, e.g., models of the Bayesian alphabet for genomic selection [21,33-36]. In Bayes A, markers of the entire genome are included in a single model. Because the number of markers can be substantially larger than the sample size, each marker effect is assigned a normal prior with mean zero and an unknown variance. The prior variance of each marker is further described by a hyper prior distribution so that the marker variance can be obtained via the posterior mean or posterior mode estimation [34,37-40]. Marker variances obtained this way are not biased because they are directly estimated from the data, not converted from the squared marker effects. In Bayes B, each marker effect is assigned a mixture of two distributions, one is a normal distribution and the other is just a zero with some non-zero probability mass [34,41]. The variance of the normal distribution in the mixture is the marker variance. This variance is also unbiased or with very little bias because the variance is not converted from the squared marker effect. In contrast to the Bayesian alphabetic series of genomic selection models, the genomic best linear unbiased prediction (GBLUP) [42], which is the same as the ridge regression [43,44], cannot be used for GWAS in its original form because all markers are assigned to the same normal distribution. The single variance is shared by all markers and is severely shrunk towards zero. However, the test statistic of each marker from the ridge regression can be de-shrunk to a comparable level as the typical mixed model GWAS [45-47]. Duarte et al. [45] de-shrank the test for each marker so that the Wald test statistic was brought back to a level similar to the test of EMMA [15,48]. However, Duarte et al. [45] only de-shrank the test and the estimated effect for each marker remains the same as the ridge regression. The two-step ridge regression approach to GWAS developed by Shen et al [46] de-shrank both the effects and the tests. The de-shrunk marker variances may be used to calculate the QTL heritability. Wang and Xu (2020) recently developed another de-shrinking method that can de-shrink both the test, the estimated marker effect, and the estimated marker variance. This variance is unbiased and can be directly used to calculate the QTL heritability. The methods summarized here are various extensions from the genomic selection models. They are not the typical methods of GWAS. The typical methods are represented by EMMA and GEMMA [15,49]. An unbiased estimate of QTL variance will lead to a less biased estimate of QTL heritability, which is expressed by , where is the estimated residual variance. The QTL heritability has many different definitions, (1) proportion of the phenotypic variance contributed by the QTL variance, (2) R squared, which is defined as the ratio of the regression sum of squares to the total sum of squares, (3) Adjusted R2, which is a modified R2 by accounting for the number of independent variables, (4) pseudo R2 [50,51], which is designed for logistic regression analysis for binary traits. All the R2 related measurements, except the adjusted R2, may be called the model goodness of fit. We will show in the discussion that the model goodness of fit is a biased estimate of the QTL heritability. The purpose of this study is to investigate the bias in the estimated QTL variance when the QTL effect is treated as a fixed effect (in plants). We show that the bias disappears when the QTL effect is treated as a random effect. We also propose a moment method to correct the bias. The bias due to significance test (the Beavis effect) has been investigated by our laboratory in a recent study where a truncated non-central Chi-square distribution has been used to derive and correct the bias [52]. This study only focuses the bias due to the use of an incorrect statistical model. We emphasize more on the conceptual issue than the practical application issue.

2 Method

2.1 Model of a quantitative trait locus

Let y be a vector of phenotypic values for a quantitative trait collected from a mapping population. The trait value can be described by the following linear mixed model, where Xβ represents fixed effects not associated with genes. If there are no fixed effects other than the population mean, Xβ = 1μ, where X = 1 is a column vector of unity and β = μ is the population mean (or intercept). Let g = Zα be an n × 1 vector of genotypic values for all individuals. The model is rewritten as where μ is the population mean, g is a vector of genotypic values for all individuals, ξ is a vector of polygenic effects with an assumed distribution, A is an additive relationship matrix, also called numerator matrix, is a polygenic variance, ε is a vector of residual errors with an assumed N(0, Rσ2) distribution, R is a residual covariance structure (often assumed to be R = I), σ2 is the residual variance. Let Z be the genotype indicator variable of individual j for the locus of interest, which is defined as where p = Pr(A1) is the frequency of allele A1 and q = Pr(A2) is the frequency of allele A2, where p + q = 1. The three capital letters, P, H and Q are the frequencies of the three genotypes and the population is assumed to be in Hardy-Weinberg equilibrium. Let α be the genetic effect of the locus, which is often called the average effect of gene substitution in classical quantitative genetics textbooks [19,20]. Since there is no distribution assigned to the QTL effect α, it is a fixed effect. The genetic variance contributed by the locus under the fixed model is defined as Here, the genetic effect α is a fixed effect (constant) and Z is a random variable with mean μ = p–q and variance . Although α is fixed, g = Zα is random because Z is random. The Hardy-Weinberg equilibrium assumption is not required and we made that assumption here is to be consistent with the classical definition of genetic variance defined in quantitative genetics textbooks [19,20]. When Z is considered as a random variable (different from the classical mixed models where a design matrix is often considered as data), the expectation of the mixed model is E(y) = 1μ and the variance of the mixed mode is This is an n × n variance matrix, where n is the sample size. The total phenotypic variance and the partitioning of the total variance are shown below, where , n-1tr(A) = 1 (assuming that no individuals are inbred), and n-1tr(R) = 1 (assuming independent and homogeneous residual variance). Therefore, Let and the proportion of phenotypic variance contributed by the QTL is At this moment, we have dealt with the model and not mentioned any estimation of the QTL variance, which will be discussed later. There is no doubt that model (1) or (2) is a mixed model because the same model includes both the fixed effects (Xβ) and the random effect (Zα + ξ). However, in a typical mixed model, the design matrices (X and Z) are treated as observed data and are considered as constants. In quantitative genetics, the design matrix Z is considered as a variable and this makes the quantitative genetics model different from a typical linear mixed model. If α is considered as a fixed effect and Z is considered as “data”, the expectation of model (1) in a typical linear mixed model analysis would be E(y) = Xβ + Zα and the variance matrix would be Information about the QTL disappears from the variance, which was first notified by Gianola et al [21]. Therefore, model (9) is not a correct model for estimation of QTL variance. If we assign a normal distribution to α with mean zero and variance , model (1) remains a mixed model with an expectation of E(y) = Xβ and a variance matrix of Now the QTL variance appears in the variance of y and we can talk about QTL variance and the proportion of phenotypic variance contributed by the QTL. We now need to interpret for a single α. According to Gianola et al [21], is called a prior distribution for α and is the prior variance or prior uncertainty. Since there is only one random draw from this distribution per population, the variance is defined as .

2.2 Estimated QTL variance and QTL heritability

It is not surprising to see the following simple extension of Eq (8) to estimate the QTL heritability, Unfortunately, this is not the correct estimate of QTL heritability (Luo et al. 2003) because . The estimate is biased upward, especially when the sample size is small. The reason is that α is unknown and it is replaced by an estimate. However, the estimation is subject to an estimation error, , which has not played a role in Eq (11). The correct estimate of the QTL variance is The estimated QTL heritability is simply It is a common practice to standardize the Z variable prior to QTL mapping so that Z = (Z*–μ)/σ, where Z* represents the Z variable in its original scale, μ and σ are the mean and standard deviation of the original Z variable. The standardized Z variable has E(Z) = 0 and var(Z) = 1. Using the standardized Z will result in Hereafter, we use the standardized genotype indicator variable in all subsequent data analyses. Therefore, .

2.3 Treating QTL effect as random

Terminologies like QTL variance and QTL heritability are defined in the context of a random QTL effect. However, all previous discussions are based on the fixed model framework for the QTL effect. Let us assume . This treatment is a Bayesian analysis of QTL effect. Recall that the linear mixed model in (1) is, When the QTL effect is treated as random, the expectation of y in Eq (15)) is E(y) = Xβ and the variance matrix of y in Eq (15) is where , and H = ZZλ + Aλ + R. The total phenotypic variance is partitioned below, where due to Z being defined as a standardized variable (E(Z) = 0 and var(Z) = 1). We now introduce a restricted maximum likelihood (REML) method to estimate the QTL variance. Let be the three variance components. Given the expectation and the variance of model (15), the restricted log likelihood function is where which is not a parameter but expressed as a function of H and thus a function of θ. Therefore, the likelihood function only contains three variance components, i.e., three parameters. Maximization of (19) with respect to θ yields the REML estimate of θ, denoted by . The price to pay for treating the QTL effect as random is that the solution is implicit and iterations are required for the REML estimate of the QTL variance. Given the estimated variance components, the estimated QTL heritability is The variance matrix of the estimated θ can be obtained via the inverse of the information matrix, The detail of is The standard error of the estimated QTL heritability (ratio of variance components) can be approximated via the Delta method. Let and The variance-covariance matrix of X and Y is Let The approximate variance of the estimated QTL heritability via the Delta method is The standard error of the estimated QTL heritability is

2.4 The MM and REML estimates of QTL variance

When the genotype indicator variable is standardized, the estimated QTL variance presented in Eq (12) is rewritten as This is a moment estimate of the QTL variance. Let us take the expectation of , The moment method of estimation for is obtained by replacing by in Eq (31), which leads to We then solve for from Eq (32), resulting in an unbiased estimate of the QTL variance, This method is called the moment method (MM). The MM estimate of QTL variance utilizes the result of a fixed model (the mixed model with the QTL effect being treated as a fixed effect). The REML method for estimation of QTL variance directly deals with a random model (the mixed model with the QTL effect being treated as a random effect). Estimates from the two different approaches are identical if negative estimates from MM are set to zero. Let be the estimated QTL effect from the fixed model and be the squared estimation error. When the residual error of the fixed model in Eq (1) is normally distributed, the estimated QTL effect is also normally distributed, i.e., . In this case, and are sufficient statistics of α. To estimate the variance of α (the QTL variance ), we can simply obtain it from the sufficient statistics, not from the original data. Let us propose a random model for (treated as an observed data point), where α is the true value with a normal distribution and is the residual error with a known error variance. The expectation of model (34) is and the variance is The likelihood function from the sufficient statistics is The ML solution is which is exactly the MM estimate of if negative solution is truncated at zero. A statistically more elegant notation for Eq (37) is The equivalence between MM and REML will also be demonstrated empirically via Monte Carlo simulations later in the Result Section.

3 Data availability

3.1 Data of a working example from rice

Data and SAS codes used in the working example in the Result Section are given in Supplementary files. contains the phenotypic values and the numerical codes (before and after standardization) of the genotypes for the locus of interest (Bin725), where the raw and standardized codes are named z0 and z, respectively, and the phenotypic value is named y. is the kinship matrix calculated from genome-wide markers (1619 bins). contains the SAS codes of PROC MIXED for parameter estimation.

3.2 Data of an application to QTL mapping in mice

The mouse population consists of 110 F2 mice derived from the cross between the B6 strain and the BTBR strain of mice [53]. The trait analyzed is the 10-week-body-weight. The mouse population was genotyped for 193 microsatellite markers over 19 autosomes with an average of 10 cM per marker interval. We added one pseudo marker in every 5 cM to generate a map with a total of 466 marker positions (193 real markers and 273 pseudo markers). An n × n = 110 × 110 kinship matrix was calculated from the 466 marker genotypes and this kinship matrix was used for QTL mapping under the polygenic model (). The GLIMMIX procedure in SAS was used to analyze the data. PROC GLIMMIX is a very general procedure that can handle generalized linear mixed models. The mouse data and the SAS code to analyze the data are provided in and , respectively.

4 Result

4.1 A working example

An IMF2 (immortalized F2) population of rice with n = 278 hybrids was used as an example for illustration [54]. The trait is the 1000-grain weight (KGW). The experiment was replicated in two years (1998 and 1999). The average of the two- year replicates is the response variable for data analysis. There are m = 1619 bins (segregating markers) available for QTL mapping. The three genotypes (A, H and B) are coded as 1 for A, 0 for H and -1 for B. The kinship matrix was calculated from all 1619 markers. The kinship matrix was eventually normalized prior to the data analysis. A normalized kinship matrix has a property of tr(K) = n, i.e., the trace of K equals the sample size. The data were originally published by Hua et al. [54] and later by Xu et al. [55]. We illustrated the analysis of a single marker (Bin725) as an example. This locus is known to contain a QTL for grain width (GW) [56]. The numerically coded genotypes of this locus were standardized prior to the data analysis. The phenotypic values and the numerical codes (before and after standardization) of the genotypes for the locus (Bin725) are given in . The raw and standardized codes are named z0 and z, respectively. The phenotypic value is named y. The kinship matrix is provided in . The SAS codes of PROC MIXED are given in . The same data were fitted to two models. One is the so called fixed model where the QTL effect was treated as a fixed effect. The other is the so called random model where the QTL effect was treated as a random effect. Both models have a random polygenic component and thus both are mixed models. shows the parameters estimated from the two models. The fixed model estimates for the parameters are also the MM estimates while the random model estimates are called the REML estimates. The two methods are clearly the same (see ). The MM method is computationally more robust than the REML method because it estimates two variance components while the REML method estimates three. The estimated QTL variance from the fixed model is and thus the estimated QTL heritability from the fixed model is The estimated QTL heritability from the random model is The two estimates are nearly identical. The standard error of the estimated QTL heritability can be obtained from the random model because we have an asymptotic variance-covariance matrix of the three estimated variance components (). Let Define and , and let be the 3 × 3 asymptotic variance matrix listed in . The delta approximation of the variance for the estimated QTL heritability is The standard error of the estimated QTL heritability is Note that the standard error is even larger than the estimated QTL heritability itself, due to the relatively small QTL variance and the small sample size (n = 278). The naïve (biased) estimate of the QTL heritability is which is slightly higher than the unbiased estimate (0.06401). The relative bias is (0.06683–0.06401)/0.06401 = 4.4%. We also investigated the R2 of the mixed model with the QTL effect being treated as a fixed effect. Since the model is a mixed model with a random polygenic component, there is no easy way to calculate various sums of squares. As a result, we used the pseudo R2 [50] to measure the model goodness of fit, which is an alternative way to measure the proportion of phenotypic variance contributed by a QTL. The likelihood ratio test statistic is The pseudo R2 is which is higher than the unbiased (0.06401) and lower than the biased (0.06683).

4.2 Equivalence between the REML and MM estimates of a QTL variance

We fixed the population size at n × m = 10 × 5 = 50, where n is the number of families and m is the number of full siblings per family. The polygenic variance and the residual variance were fixed at . A QTL was simulated with frequencies of Pr(A1A1) = 0.25, Pr(A1A2) = 0.5 and Pr(A2A2) = 0.25, respectively. The numerical codes (Z variable) for the three genotypes were set at 1, 0 and -1, respectively, for the three genotypes. The Z variable was eventually standardized to have mean 0 and variance 1. Four simulation experiments were conducted under four different levels of QTL heritability (): 0.05, 0.10, 0.15 and 0.20. The values (true values) were calculated from where and σ2 = 10 are the polygenic and residual variances, respectively. The α2 values are 1.0526, 2.2222, 3.5294 and 5.0000, respectively, corresponding to the four different levels of QTL heritability. Each experiment was replicated 500 times. The estimated QTL variances from the fixed model (truncated moment) and the random model (REML) were compared by plotting the fixed model estimate against the random model estimate (). All points of the scatter plots are on the diagonal lines except a couple of points slightly deviating from the diagonals. The simulations empirically validated that the truncated moment method is equivalent to the REML method. The slight deviations between the two methods is due to local convergence of the REML method because it involves three variance components while the truncated moment method involves only two variance components. In real data analysis, the random model analysis is not necessary because it is identical to the fixed model analysis and the latter is significantly faster than the former in terms of computational speed (see Discussion).

Comparison of the estimated QTL variances from the fixed model and the random model.

Moment estimate (fixed model) of QTL variance plotted against REML estimate (random model) of QTL variance when the true QTL heritability is (A) 0.05, (B) 0.10, (C) 0.15 and (D) 0.20.

4.3 Bias of estimated QTL variance

4.3.1 Single marker analysis

This method is similar to interval mapping, where one QTL is included in a regression model and there is no polygenic background control for multiple QTL. To show the bias in estimated QTL variance and QTL heritability, we simulated data in the following scenarios. The residual variance was set at σ2 = 20, the population mean was set at μ = 10. The QTL heritability ranged from 0 to 0.2 incremented by 0.001. The QTL genotype indicator variable (Z) was generated from three genotypes with frequencies of 0.25, 0.5 and 0.25, respectively. The squared QTL effect corresponding to a given QTL heritability was calculated from The Z variable was eventually standardized, i.e., μ = 0 and , prior to data analysis. The sample size varied at the following levels: 25, 50 100, 150, 200 and 250. A total of 500 replicated experiments were conducted under each scenario. The average of the 500 replicates was plotted. The estimated QTL variances (squared method, moment method and restricted maximum likelihood method) are plotted against the true QTL variance. The results are shown in . The naïve squared method (purple) is clearly biased upwards because the curves deviate far from the diagonal lines. The bias of the squared method is progressively reduced until the sample reaches 250 where the bias is barely noticeable. The REML method (blue curve) shows some bias when the true QTL variance is small and the sample is very small (n = 25 and n = 50), but the bias fades away quickly as the sample size reaches n = 100. The moment method (negative estimate is allowed) shows no bias in all sample sizes and in all range of the true QTL variance.

Plots of estimated QTL variances from three methods against the true QTL variance.

The six panels of the figure show the results of six different sample sizes (n), which are 25, 50, 100, 150, 200 and 250, respectively. We also compared the estimated QTL heritability under the six sample sizes. Here, we first calculated the average QTL variance estimated from 500 replicated simulation experiments. We then calculated the estimated QTL heritability from the average estimated QTL variance, as demonstrated below, The same trends observed for the estimated QTL variance were also observed here for the estimated QTL heritability (see ).

Plots of estimated QTL heritability from three methods against the true QTL heritability.

The six panels of the figure show the results of six sample sizes (n), which are 25, 50, 100, 150, 200 and 250, respectively.

4.3.2 Polygenic model analysis

The model includes a polygenic effect to control the genetic background. Such a mixed model is the GWAS model [2] and the polygenic controlled QTL mapping procedure [16]. This model is analogous to the composite interval mapping where the genetic background is controlled by selected co-factors. We simulated full-sib family data with m = 5 full siblings per family. The number of families in a simulated population determines the sample size. We set the number of families at n = 5,10,20,30,40,50, corresponding to samples sizes n × m = 25,50,100,150,200,250, respectively. The residual variance was set at σ2 = 10 and the polygenic variance was set at . The QTL heritability is defined as which allows us to calculate the QTL effect α via Again the genotypic code of a single QTL was standardized when used to generate and analyze the data. The true value ranges from 0 to 0.2 incremented by 0.001. Each experiment was replicated 500 times. The plots of the average estimated QTL variance from the 500 replicated simulations against the true QTL variance are illustrated in . The purple curves (the naïve squared method) is seriously biased in small samples (n = 25 and n = 50). However, the bias is very small for n = 100 and is barely noticeable when the sample size is above 150. The REML estimate is biased for n = 25 when the QTL variance is smaller than 2 (corresponding to QTL heritability of 0.075). The moment estimate of QTL variance is unbiased in all sample sizes and in the entire range of QTL variance. Similar trends were observed for the QTL heritability (see ).

Plots of estimated QTL variance against the true QTL variance under the polygenic model.

The six panels of the figure show the results of six sample sizes (n), which are 25, 50, 100, 150, 200 and 250, respectively.

Plots of estimated QTL heritability against the true QTL heritability under the polygenic model.

The six panels of the figure show the results of six sample sizes (n), which are 25, 50, 100, 150, 200 and 250, respectively. Comparing with (also with ), we realized that adding a polygene to the model can reduce the bias of the naïve squared method and the REML method relative to the corresponding single-marker analysis methods.

4.4 An application to QTL mapping for a mouse population

The mouse population consists of n = 110 F2 mice genotyped for 193 markers. Adding 273 pseudo markers uniformly across the entire genome generated a map with an average of 5 cM per marker interval. The total number of marker positions is 193 + 273 = 466. We scanned the entire genome with two different models under two different strategies of QTL mapping. The two models are the fixed model and the random model. In the fixed model, the fixed effects included the intercept, the sex effect (1 for male and 0 for female) and the standardized marker genotype indicator variable. No random effect was included in the fixed model other than the residual error. For the random model, the standardized marker genotype indicator variable was included in the model as a random effect. The fixed effects included the intercept and the sex effect. The two QTL mapping strategies are the interval mapping procedure and the polygenic mapping procedure. The polygenic model, by definition, included a polygenic effect in the model to capture the polygenic background effect, while the interval mapping procedure does not include this polygenic effect. shows the comparisons of the two models under the two strategies of QTL mapping for the 10-week-body-weight trait of the mouse population. The blue circles are the plots of the estimated QTL variances from the random model (QTL effect defined as a random effect) against the QTL variances from the fixed model moment method (QTL effect defined as a fixed effect). The red circles are the plots of the QTL variance from the squared effect method against the QTL variance from the fixed model moment method. Clearly, the random model and the fixed model moment methods are identical in the estimated QTL variance because the blue circles are all on the diagonals of the plots, while the QTL variance estimated from the squared effect method is consistently biased upward because the red circles are all above the diagonals of the plots. The left panels ( and ) of the figure show the estimated QTL variances from the two models under the two QTL mapping procedures. The right panels ( and ) of the figure compare the QTL heritability for the two models under the two QTL mapping procedures. The top panels ( and ) of the figure show the result from the interval mapping procedure while the bottom panels ( and ) show the result from the polygenic model analysis. Comparing the two QTL mapping procedures (interval mapping vs. polygenic mapping), the biases in estimated QTL variance (heritability) are greater for the polygenic method than the biases for the interval mapping procedure.

Comparison of QTL variance and heritability from three estimation methods (squared, random model and fixed model).

(A) and (B) Plots of estimated QTL variance and heritability from the square method and the random model method against the estimates from the fixed model approach for interval mapping. (C) and (D) Plots of estimated QTL variance and heritability from the square method and the random model method against the estimates from the fixed model approach for polygenic mapping. shows the Wald test statistic profiles and the QTL heritability profiles for the two QTL mapping procedures (interval mapping vs. polygenic mapping). The patterns of the profiles are much the same for the two procedures, but the profiles of the polygenic procedure have been substantially reduced compared to the interval mapping procedure. The threshold of the Wald test after Bonferroni correction is where 193 is the number of real markers. The interval mapping procedure detected a significant marker on Chromosome 2 that is associated with the body weight trait of mice (). This marker, however, is not significant for the polygenic method () due to strong shrinkage of the polygenic method. Interestingly, the marker with the highest Wald test statistic from the polygenic method is on Chromosome 18 with a Wald test statistic of 9.73 (not significant).

Wald tests and estimated QTL heritability for body weight of the F2 mouse population.

(A) and (C) show the Wald test statistics from interval mapping and polygenic mapping, respectively. (B) and (D) show the estimated QTL heritability from interval mapping and polygenic mapping, respectively. We now describe the marker on Chromosome 2 with the highest Wald test detected from the interval mapping procedure. This is a pseudo marker about 9 cM away from a real marker. The test statistic is W = 17.72 with p = 0.00002559. The estimated QTL heritability from the fixed model moment method, the random model method and the squared effect method are 0.1333, 0.1333 and 0.1401, respectively. The bias is (0.1401–0.1333)/0.1333 = 5.14%. None of the markers were significant from the polygenic method. The marker with the highest test statistic from the polygenic model analysis is on Chromosome 18 and it is a pseudo marker, 15 cM away from a real marker. The Wald test of this pseudo marker is W = 9.73 with a p-value of p = 0.001812845. The estimated QTL heritability are 0.1155, 0.1153 and 0.1271, respectively, for the fixed model moment method, the random model method and the squared effect method. The relative bias of the squared effect method is (0.1271–0.1153)/0.1153 = 10.21%.

5 Discussion

In practice, the bias correction is only necessary for populations smaller than 200. Since most QTL mapping and GWAS experiments are conducted with sample sizes perhaps larger than 200, the current study is not intended to be read by crop and animal breeders. Tree breeders are a special group who often deal with small samples. QTL mapping and GWAS in trees may need bias correction for estimated QTL variances. The current study contributes more to the quantitative genetics theory than to practical data analysis. Typical QTL mapping and GWAS models include QTL effects as fixed effects while the sizes of QTLs are reported as QTL variances or QTL heritability. The concept of variance does not go well with a fixed effect. It is the random effect that involves a variance. This conceptual relationship has been confused in the QTL mapping community for over three decades. This study has clarified this fundamental relationship. Another fundamental contribution of this study to statistics is the “randomized fixed model approach” to estimating variances. If the number of levels of a random effect is small, this randomized fixed model can be used to estimate the variance associated with the random effect. We estimate the fixed effect as the best linear unbiased estimate (BLUE) and then convert the estimated fixed effect into a variance, like we presented in Eq (30) for the estimated QTL variance. A single regression coefficient is considered as just one level of a random effect. For multiple levels of a random effect, say α = [α1 α2], and if each level of the random effect follows the same distribution, say for k = 1,2, the randomized fixed model approach to estimating is a simple extension of the MM estimate, as shown below, where is a 2 × 2 variance matrix. In quantitative genetics, the four epistatic effects per pair of loci (additive × additive, additive × dominance, dominance × additive and dominance × dominance) may be modeled as a random effect with four group levels. Each level of the epistatic effects follows the same normal distribution denoted by . The epistatic variance () may be easily estimated using Eq (40) with the 2 levels in the formula substituted by 4 levels of the epistatic model. The original random (or mixed) model methodology does not have an explicit estimate of a variance component for the ML and REML methods. The randomized fixed model provides an explicit solution. As a result, this new method is much like the Type3 method of the MIXED procedure in SAS, but it is the QTL mapping version of the Type3 method. Variance components may also be estimated via the Bayesian method by assigning a prior distribution to each variance components. The Bayesian estimate of is drastically different from the maximum likelihood estimate of in the situation of regression analysis. The reason is that the variance is defined and estimated from a “single group level.” A good Bayesian estimate of a variance component needs at least three group levels [57-59]. The MCMC procedure in SAS was used to implement the Bayesian method for parameter estimation. Since coding PROC MCMC for the polygenic model using the marker-inferred kinship matrix is very difficult, we only investigated the simple model without the polygenic background control. Five different prior distributions were investigated, including the uniform prior on and a weakly informative half-Cauchy prior on σ. Please see for a complete list of prior distributions investigated in this project. Table A in shows the results of Bayesian estimates of parameters in comparison with the estimates of the restricted maximum likelihood methods for the hybrid rice data (data of the working example). The estimated QTL effects and residual variances across a range of prior distributions are much the same compared with the estimates from the restricted maximum likelihood methods. However, the Bayesian estimates of the QTL variance are drastically different from the REML estimate and the differences are highly dependent of the prior distributions. Therefore, the proposed MM and REML estimate of the QTL variance under one group level may be the only option for estimating QTL heritability. An alternative method to estimate the QTL heritability is the R squared (R2), which does not rely on an estimated QTL variance. It requires partitioning of the total sum of squares into the regression sum of squares and the residual sum of squares. The ratio of the regression sum of squares to the total sum of squares is the R squared. This R squared is also called the coefficient of model determination, the model goodness of fit and so on. Take the interval mapping (single marker analysis without control for the polygenic background) for example, the regression model is where μ is the intercept and because variable Z has been standardized. The naïve estimate of the QTL heritability is defined as The R squared is defined as where is the regression sum of squares and is the residual sum of squares. The R squared is Comparing Eq (43) with Eq (42), we conclude that . The equality is only approached asymptotically. Since defined in Eq (42) is already biased, the R squared is certainly biased as well. The adjusted R squared, however, is a modification of the original R squared by taking into account the model size (number of independent variables). After a few steps of manipulation, the adjusted R squared can be expressed as We now re-write the estimated QTL heritability as where and due to the fact that . Eqs (44) and (45) are identical and thus . The bias corrected heritability is not the R squared goodness of fit but it is identical to the adjusted R squared. The additive genetic variance of a quantitative trait locus presented in Eq (4) is given in classical quantitative genetics textbooks [19,20]. Surprisingly, it is the result of a fixed model treatment of the QTL effect. The naïve estimate of the QTL variance converted from the squared effect is biased. With the random model, we can directly estimate the QTL variance via the REML method. However, there is no explicit solution for the REML estimation of the QTL variance, even if the model is a single marker model without polygene. The MM method is derived under the fixed model, but the MM estimate is identical to the REML estimate. Under the single marker model (without the polygene), the MM estimate is explicit, much more convenient to achieve than the random model REML estimate. Under the polygenic model with the QTL effect being considered as a fixed effect, there are two variance components, and σ2, explicit solution of the variance parameters are not available anyway. The fixed model approach () to estimating the QTL variance has two advantages over the random model approach (). (1) In variance component analysis, adding one more variance can substantially increase the computational time, especially in large samples. Furthermore, in QTL mapping and GWAS, we are talking about estimating one more variance component for every locus while the total number of marker loci can be up to millions. (2) Adding one more variance component to the parameter array can complicate the landscape of the restricted log likelihood function and increase the risk of local convergence. Regarding the computational times, we compared the fixed model and the random model of PROC MIXED in SAS for the working example (IMF2 rice population with 278 lines at bin725), the fixed model and random model took 3.40 and 3.49 CPU seconds, respectively. The corresponding numbers of iterations required for convergence were 6 and 5 for the fixed and random models, respectively. We then scanned the entire genome of 1619 markers for the same rice population with both the fixed model and the random model of PROC MIXED in SAS. The fixed model took 1 hour, 31 minutes and 38 seconds of CPU time to complete the scanning. The random model, however, took 2 hours, 20 minutes and 39 seconds of CPU time. The average numbers of iterations (over the 1619 loci) required for convergence were 11.73 and 12.25 for the fixed model and the random model, respectively. Our common belief in QTL mapping and GWAS is that there are just a few detectable QTL per experiment. When the sample size is very large, more QTL can be detected. The current GWAS in human height has identified 83 associated SNPs with a sample size as large as 711428 individuals [60]. We are not interested in presenting QTL variances or QTL heritability of the entire genome; rather, only variances of detected QTL need to be presented. Therefore, we can still use the fixed effect model to scan the genome and only go back to the significant loci to calculate the QTL variances. In this case, we can tolerate the extra cost of REML estimation of QTL variances for the limited number of significant loci. One advantage of the REML estimation for the QTL variance is that an asymptotic variance matrix for is available in PROC MIXED. This matrix allows us to calculate an approximate standard error of the estimate QTL heritability via the Delta approximation. Under the fixed model, however, we only have the asymptotic variance matrix for ; the variance for and the covariance between and are not available. Therefore, we cannot calculate the standard error of the estimated QTL heritability. Another advantage of the random model (treating QTL effects as random) is to calculate the heritability of a QTL in a multiple QTL model when linkage disequilibrium (LD) among markers is present. This problem has been investigated by Gianola et al. [23] under the fixed effect model. We now review the fixed model approach using three loci as an example. The linear mixed model is where Z is the genotype indicator variable for QTL k and α is the effect of QTL k for k = 1,2,3. If these QTL are treated as fixed effects, the genetic variance associated with the three loci is where is the variance of Z and is the covariance between Z and . This covariance is called the linkage disequilibrium (LD). The variance contributed by the kth locus is [23] This QTL variance looks very strange because the variance of QTL k contains effects of other QTL. The genetic variance contributed by the three loci is collectively expressed by When the LD is absent, the covariance disappears and the variance of QTL k does not contain effects of other QTL. If the effects of all QTL are treated as random effects, the genetic variance contributed by the three loci is where for k = 1,2,3. If the genotype indicator variables are standardized, for all k = 1,2,3, the above genetic variance is simplified into Therefore, treating multiple QTL effects as random has substantially simplified the genetic variance contributed by each QTL, even if LD is present. A statistically significant QTL does not mean that it is significant biologically if the QTL contributes a very small proportion of the trait variance. However, a large QTL of biologically significance may not be significant statistically. The lack of power to detect such a QTL is primarily due to interactions of the QTL with other QTLs in linkage disequilibrium. For example, if two QTLs are in high LD but one is an antagonist of the other, neither one may be detected because the effect of one locus is cancelled by the other. A multiple locus model may improve the detection of both loci. More importantly, each locus may act as a member in a genetic network that consists of many loci [61]. An individual locus may not be detected alone but is detectable collectively as a network. Regarding to the extra computational cost for correcting the bias, if the fixed model is used, there is no extra cost, because the correction only needs the intermediate results of QTL mapping and GWAS, i.e., the estimated QTL effect and the squared estimation error per locus. The intermediate results are often provided in the output files of QTL mapping and GWAS software packages. It appears that the correction method requires the genotype indicator variable (Z) to be standardized prior to the data analysis. In fact, this assumption is presented for ease of presentation and is not absolutely required. If the results of QTL mapping and GWAS are obtained from an unstandardized Z variable, we simply calculate the QTL variance using , not . Calculation of for each locus presents some extra computational burden.

Genotype indicator variables and phenotypic values of 1000 grain weight (KGW) of rice from 278 hybrids.

Column 1 (hybrid): IDs of 278 hybrid rice; Column2 (bin): this the bin ID (Bin 725) chosen from a total of 1619 bins. Column 3 (y): This is the average phenotypic value of KGW collected from 1998 and 1999. Column 4 (z0): This is the genotype indicator variable, 1, 0, -1, representing the three genotypes, A, H and B, respectively. Column 5 (z): this is the standardized genotypic indicator variable, z = (z0 –mean(z0))/stdev(z0). (XLSX) Click here for additional data file.

Marker inferred kinship matrix among the 278 hybrid rice.

The matrix has been normalized so that all diagonal elements are unity. The first column (parm) holds a value of 1 and the second column (row) holds the row number of the kinship matrix. col1 –col278 represent the column names of the kinship matrix. This format is required by PROC MIXED (a SAS procedure). (XLSX) Click here for additional data file.

Marker inferred kinship matrix among the 110 mice.

The matrix has been normalized so that all diagonal elements are unity. The first column (parm) holds a value of 1 and the second column (row) holds the row number of the kinship matrix. col1 –col110 represent the column names of the kinship matrix. This format is required by PROC MIXED and PROC GLIMMIX (SAS procedure). (CSV) Click here for additional data file.

This file contains the genotype indicator variable (z0) and the standardized version of this variable (z).

It also contains the phenotypic values of the 10-week-body-weight trait for all the 110 mice (y). The x variable indicates the sex, 1 for male and 0 for female. The data are sorted by markers. (CSV) Click here for additional data file.

SAS code to read the data and estimate the variance components from the fixed model and the random model of PROC MIXED.

The first block of codes calls PROC MIXED with the QTL effect being treated as a random effect. The second block of codes calls PROC MIXED with the QTL effect being treated as a fixed effect. (SAS) Click here for additional data file.

SAS code to read the data and perform QTL mapping for the mouse population.

The file contains four blocks of SAS program, one for each of the four combinations of the two models (fixed and random models) and the two procedures (interval and polygenic mappings). PROC GLIMMIX was used to perform the QTL mapping. (SAS) Click here for additional data file.

Description of the Bayesian method and the Bayesian estimate of QTL variance ().

Table A: Estimated parameters for trait KGW from the fixed model, the random model and the Bayesian analysis for bin 725 of the IMF2 rice population. (DOCX) Click here for additional data file. 24 Sep 2021 Submitted filename: Response-9-24-2021.docx Click here for additional data file. 17 Nov 2021 Dear Prof. Xu, Thank you very much for submitting your manuscript "Estimating Genetic Variance Contributed by a Quantitative Trait Locus: A Random Model Approach" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Mingyao Li Associate Editor PLOS Computational Biology Jian Ma Deputy Editor PLOS Computational Biology *********************** Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: This manuscript proposes a random-effect model approach to estimating the QTL variance. The method reformulates the QTL model by treating the QTL effect as random and directly estimate the QTL variance (as a variance component) or adjust the bias by taking into account the error of the estimated QTL effect. A moment method of estimation has been proposed to correct the bias. The method has been validated via Monte Carlo simulation studies. The method has been applied to QTL mapping for the 10-week-body-weight trait from an F2 mouse population. The manuscript was well written, and developed a novel method that can be applied to many real data sets. I evaluate the work as a useful contribution and can be published. I have two comments. 1. The manuscript includes too many equations and derivations, some of which are easily derived and standard. I recommend simplify the mathematical presentation and thus improve readable. 2. The proposed random-effect assumes that the QTL effects follow a normal prior. What is the prior on the prior variance? You use an uniform prior. However, it results in a estimation towards zero. Gelman et al. Bayesian Data Analysis (Chapter 5) suggests weakly informative prior, which can solve the problem. Reviewer #2: This manuscript presents an old but important question in QTL mapping and GWAS; i.e., how to correct biased genetic variance explained by a single QTL. The manuscript was well written and results are scientifically sound. I only have a minor comment. In this reviewer's opinion, mapping and estimating a single QTL is not meaningful in practical breeding schemes. As claimed by the authors, a small-effect but statistically significant QTL is not useful in practice. Yet, such QTLs are very commonly detected in plant, animal and human genetic studies. According to a recent study, an insignificant locus by statistical testing is not necessarily insignificant on its merit, rather its effect is compromised by negative regulators (Wang et al. 2021). I believe that an in-depth discussion on this issue is crucial for strengthening this manuscript's quality and impact. Wang HJ, et al. (2021) Modeling genome-wide by environment interactions through omnigenic interactome networks. Cell Reports 35: 109114. Reviewer #3: The paper tries to address bias in h2 estimate due to non Beavis effect theoretically and empirically. However, their results suggest that such bias is negligible as long as n is not too small. Given that Beavis effects over dominant non Beavis ones, practically, it is thus important to simultaneously correct both for studies with very small n, which should be properly addressed. Otherwise, this research has very limited practical usages. Some other concerns 1. Not agree with the comments made between lines 277-282. The variance in (10) is a conditional variance of y given X and Z. The conditional variance of y given X only will depend on Z, so does the marginal variance of y. 2. Equations (16)-(18) are very confusing. So are equations (18) and (19), and it is not clear what is the difference between the two hat estimates of alpha in these two equations. 3. Section 2 is very wordy and can be significantly shortened. The authors should only present the models and derive the biases concisely without lengthy explanations. ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: None Reviewer #2: Yes Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at . Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols Submitted filename: review for PCOMPBIOL.docx Click here for additional data file. 14 Dec 2021 Submitted filename: Response.docx Click here for additional data file. 13 Feb 2022 Dear Prof. Xu, We are pleased to inform you that your manuscript 'Estimating Genetic Variance Contributed by a Quantitative Trait Locus: A Random Model Approach' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Mingyao Li Associate Editor PLOS Computational Biology Jian Ma Deputy Editor PLOS Computational Biology *********************************************************** Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors have nicely addressed my previous comments. The revised manuscript has been improved. I have no further concerns. Reviewer #2: The authors have satisfactorily addressed my concern. ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: None Reviewer #2: None ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No 4 Mar 2022 PCOMPBIOL-D-21-01711R1 Estimating Genetic Variance Contributed by a Quantitative Trait Locus: A Random Model Approach Dear Dr Xu, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Orsolya Voros PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Table 1

Estimated parameters of trait KGW from the fixed and random models for bin 725 of the rice population ().

Parameter	Fixed model		Random model
Parameter	Estimate	StdErr	Estimate	StdErr
α	0.5278	0.1122	0.5039	0.1096
σα2	0.2660	--	0.2660	0.3939
σξ2	3.5054	0.5375	3.5038	0.5371
σ ²	0.3842	0.0938	0.3845	0.0938

Table 2

Asymptotic variance-covariance matrix of the estimated variance components for trait KGW from the random model analysis (REML) for bin 725 of the rice population ().

var(θ^)	σα2	σξ2	σ ²
σα2	0.15520	0.00036	-0.00020
σξ2	0.00036	0.28840	-0.02950
σ ²	-0.00020	-0.02950	0.00881

48 in total

1. Bayesian methods applied to GWAS.

Authors: Rohan L Fernando; Dorian Garrick
Journal: Methods Mol Biol Date: 2013

2. Mapping quantitative trait loci by controlling polygenic background effects.

Authors: Shizhong Xu
Journal: Genetics Date: 2013-09-27 Impact factor: 4.562

3. Variance component model to account for sample structure in genome-wide association studies.

Authors: Hyun Min Kang; Jae Hoon Sul; Susan K Service; Noah A Zaitlen; Sit-Yee Kong; Nelson B Freimer; Chiara Sabatti; Eleazar Eskin
Journal: Nat Genet Date: 2010-03-07 Impact factor: 38.330

4. An improved multipoint sib-pair analysis of quantitative traits.

Authors: D W Fulker; S S Cherny
Journal: Behav Genet Date: 1996-09 Impact factor: 2.805

5. Mapping mendelian factors underlying quantitative traits using RFLP linkage maps.

Authors: E S Lander; D Botstein
Journal: Genetics Date: 1989-01 Impact factor: 4.562

6. The investigation of linkage between a quantitative trait and a marker locus.

Authors: J K Haseman; R C Elston
Journal: Behav Genet Date: 1972-03 Impact factor: 2.805

7. Connecting QTLS to the g-matrix of evolutionary quantitative genetics.

Authors: John K Kelly
Journal: Evolution Date: 2008-12-12 Impact factor: 3.694

8. Modeling genome-wide by environment interactions through omnigenic interactome networks.

Authors: Haojie Wang; Meixia Ye; Yaru Fu; Ang Dong; Miaomiao Zhang; Li Feng; Xuli Zhu; Wenhao Bo; Libo Jiang; Christopher H Griffin; Dan Liang; Rongling Wu
Journal: Cell Rep Date: 2021-05-11 Impact factor: 9.423

9. Rapid screening for phenotype-genotype associations by linear transformations of genomic evaluations.

Authors: Jose L Gualdrón Duarte; Rodolfo J C Cantet; Ronald O Bates; Catherine W Ernst; Nancy E Raney; Juan P Steibel
Journal: BMC Bioinformatics Date: 2014-07-19 Impact factor: 3.169

10. Rare and low-frequency coding variants alter human adult height.

Authors: Eirini Marouli; Mariaelisa Graff; Carolina Medina-Gomez; Ken Sin Lo; Andrew R Wood; Troels R Kjaer; Rebecca S Fine; Yingchang Lu; Claudia Schurmann; Heather M Highland; Sina Rüeger; Gudmar Thorleifsson; Anne E Justice; David Lamparter; Kathleen E Stirrups; Valérie Turcot; Kristin L Young; Thomas W Winkler; Tõnu Esko; Tugce Karaderi; Adam E Locke; Nicholas G D Masca; Maggie C Y Ng; Poorva Mudgal; Manuel A Rivas; Sailaja Vedantam; Anubha Mahajan; Xiuqing Guo; Goncalo Abecasis; Katja K Aben; Linda S Adair; Dewan S Alam; Eva Albrecht; Kristine H Allin; Matthew Allison; Philippe Amouyel; Emil V Appel; Dominique Arveiler; Folkert W Asselbergs; Paul L Auer; Beverley Balkau; Bernhard Banas; Lia E Bang; Marianne Benn; Sven Bergmann; Lawrence F Bielak; Matthias Blüher; Heiner Boeing; Eric Boerwinkle; Carsten A Böger; Lori L Bonnycastle; Jette Bork-Jensen; Michiel L Bots; Erwin P Bottinger; Donald W Bowden; Ivan Brandslund; Gerome Breen; Murray H Brilliant; Linda Broer; Amber A Burt; Adam S Butterworth; David J Carey; Mark J Caulfield; John C Chambers; Daniel I Chasman; Yii-Der Ida Chen; Rajiv Chowdhury; Cramer Christensen; Audrey Y Chu; Massimiliano Cocca; Francis S Collins; James P Cook; Janie Corley; Jordi Corominas Galbany; Amanda J Cox; Gabriel Cuellar-Partida; John Danesh; Gail Davies; Paul I W de Bakker; Gert J de Borst; Simon de Denus; Mark C H de Groot; Renée de Mutsert; Ian J Deary; George Dedoussis; Ellen W Demerath; Anneke I den Hollander; Joe G Dennis; Emanuele Di Angelantonio; Fotios Drenos; Mengmeng Du; Alison M Dunning; Douglas F Easton; Tapani Ebeling; Todd L Edwards; Patrick T Ellinor; Paul Elliott; Evangelos Evangelou; Aliki-Eleni Farmaki; Jessica D Faul; Mary F Feitosa; Shuang Feng; Ele Ferrannini; Marco M Ferrario; Jean Ferrieres; Jose C Florez; Ian Ford; Myriam Fornage; Paul W Franks; Ruth Frikke-Schmidt; Tessel E Galesloot; Wei Gan; Ilaria Gandin; Paolo Gasparini; Vilmantas Giedraitis; Ayush Giri; Giorgia Girotto; Scott D Gordon; Penny Gordon-Larsen; Mathias Gorski; Niels Grarup; Megan L Grove; Vilmundur Gudnason; Stefan Gustafsson; Torben Hansen; Kathleen Mullan Harris; Tamara B Harris; Andrew T Hattersley; Caroline Hayward; Liang He; Iris M Heid; Kauko Heikkilä; Øyvind Helgeland; Jussi Hernesniemi; Alex W Hewitt; Lynne J Hocking; Mette Hollensted; Oddgeir L Holmen; G Kees Hovingh; Joanna M M Howson; Carel B Hoyng; Paul L Huang; Kristian Hveem; M Arfan Ikram; Erik Ingelsson; Anne U Jackson; Jan-Håkan Jansson; Gail P Jarvik; Gorm B Jensen; Min A Jhun; Yucheng Jia; Xuejuan Jiang; Stefan Johansson; Marit E Jørgensen; Torben Jørgensen; Pekka Jousilahti; J Wouter Jukema; Bratati Kahali; René S Kahn; Mika Kähönen; Pia R Kamstrup; Stavroula Kanoni; Jaakko Kaprio; Maria Karaleftheri; Sharon L R Kardia; Fredrik Karpe; Frank Kee; Renske Keeman; Lambertus A Kiemeney; Hidetoshi Kitajima; Kirsten B Kluivers; Thomas Kocher; Pirjo Komulainen; Jukka Kontto; Jaspal S Kooner; Charles Kooperberg; Peter Kovacs; Jennifer Kriebel; Helena Kuivaniemi; Sébastien Küry; Johanna Kuusisto; Martina La Bianca; Markku Laakso; Timo A Lakka; Ethan M Lange; Leslie A Lange; Carl D Langefeld; Claudia Langenberg; Eric B Larson; I-Te Lee; Terho Lehtimäki; Cora E Lewis; Huaixing Li; Jin Li; Ruifang Li-Gao; Honghuang Lin; Li-An Lin; Xu Lin; Lars Lind; Jaana Lindström; Allan Linneberg; Yeheng Liu; Yongmei Liu; Artitaya Lophatananon; Jian'an Luan; Steven A Lubitz; Leo-Pekka Lyytikäinen; David A Mackey; Pamela A F Madden; Alisa K Manning; Satu Männistö; Gaëlle Marenne; Jonathan Marten; Nicholas G Martin; Angela L Mazul; Karina Meidtner; Andres Metspalu; Paul Mitchell; Karen L Mohlke; Dennis O Mook-Kanamori; Anna Morgan; Andrew D Morris; Andrew P Morris; Martina Müller-Nurasyid; Patricia B Munroe; Mike A Nalls; Matthias Nauck; Christopher P Nelson; Matt Neville; Sune F Nielsen; Kjell Nikus; Pål R Njølstad; Børge G Nordestgaard; Ioanna Ntalla; Jeffrey R O'Connel; Heikki Oksa; Loes M Olde Loohuis; Roel A Ophoff; Katharine R Owen; Chris J Packard; Sandosh Padmanabhan; Colin N A Palmer; Gerard Pasterkamp; Aniruddh P Patel; Alison Pattie; Oluf Pedersen; Peggy L Peissig; Gina M Peloso; Craig E Pennell; Markus Perola; James A Perry; John R B Perry; Thomas N Person; Ailith Pirie; Ozren Polasek; Danielle Posthuma; Olli T Raitakari; Asif Rasheed; Rainer Rauramaa; Dermot F Reilly; Alex P Reiner; Frida Renström; Paul M Ridker; John D Rioux; Neil Robertson; Antonietta Robino; Olov Rolandsson; Igor Rudan; Katherine S Ruth; Danish Saleheen; Veikko Salomaa; Nilesh J Samani; Kevin Sandow; Yadav Sapkota; Naveed Sattar; Marjanka K Schmidt; Pamela J Schreiner; Matthias B Schulze; Robert A Scott; Marcelo P Segura-Lepe; Svati Shah; Xueling Sim; Suthesh Sivapalaratnam; Kerrin S Small; Albert Vernon Smith; Jennifer A Smith; Lorraine Southam; Timothy D Spector; Elizabeth K Speliotes; John M Starr; Valgerdur Steinthorsdottir; Heather M Stringham; Michael Stumvoll; Praveen Surendran; Leen M 't Hart; Katherine E Tansey; Jean-Claude Tardif; Kent D Taylor; Alexander Teumer; Deborah J Thompson; Unnur Thorsteinsdottir; Betina H Thuesen; Anke Tönjes; Gerard Tromp; Stella Trompet; Emmanouil Tsafantakis; Jaakko Tuomilehto; Anne Tybjaerg-Hansen; Jonathan P Tyrer; Rudolf Uher; André G Uitterlinden; Sheila Ulivi; Sander W van der Laan; Andries R Van Der Leij; Cornelia M van Duijn; Natasja M van Schoor; Jessica van Setten; Anette Varbo; Tibor V Varga; Rohit Varma; Digna R Velez Edwards; Sita H Vermeulen; Henrik Vestergaard; Veronique Vitart; Thomas F Vogt; Diego Vozzi; Mark Walker; Feijie Wang; Carol A Wang; Shuai Wang; Yiqin Wang; Nicholas J Wareham; Helen R Warren; Jennifer Wessel; Sara M Willems; James G Wilson; Daniel R Witte; Michael O Woods; Ying Wu; Hanieh Yaghootkar; Jie Yao; Pang Yao; Laura M Yerges-Armstrong; Robin Young; Eleftheria Zeggini; Xiaowei Zhan; Weihua Zhang; Jing Hua Zhao; Wei Zhao; Wei Zhao; He Zheng; Wei Zhou; Jerome I Rotter; Michael Boehnke; Sekar Kathiresan; Mark I McCarthy; Cristen J Willer; Kari Stefansson; Ingrid B Borecki; Dajiang J Liu; Kari E North; Nancy L Heard-Costa; Tune H Pers; Cecilia M Lindgren; Claus Oxvig; Zoltán Kutalik; Fernando Rivadeneira; Ruth J F Loos; Timothy M Frayling; Joel N Hirschhorn; Panos Deloukas; Guillaume Lettre
Journal: Nature Date: 2017-02-01 Impact factor: 49.962