Literature DB >> 18974168

Bayesian robust analysis for genetic architecture of quantitative traits.

Runqing Yang¹, Xin Wang, Jian Li, Hongwen Deng.

Abstract

MOTIVATION: In most quantitative trait locus (QTL) mapping studies, phenotypes are assumed to follow normal distributions. Deviations from this assumption may affect the accuracy of QTL detection and lead to detection of spurious QTLs. To improve the robustness of QTL mapping methods, we replaced the normal distribution for residuals in multiple interacting QTL models with the normal/independent distributions that are a class of symmetric and long-tailed distributions and are able to accommodate residual outliers. Subsequently, we developed a Bayesian robust analysis strategy for dissecting genetic architecture of quantitative traits and for mapping genome-wide interacting QTLs in line crosses.
RESULTS: Through computer simulations, we showed that our strategy had a similar power for QTL detection compared with traditional methods assuming normal-distributed traits, but had a substantially increased power for non-normal phenotypes. When this strategy was applied to a group of traits associated with physical/chemical characteristics and quality in rice, more main and epistatic QTLs were detected than traditional Bayesian model analyses under the normal assumption.

Entities: Chemical Disease Species

Mesh：

Year: 2008 PMID： 18974168 PMCID： PMC2666810 DOI： 10.1093/bioinformatics/btn558

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

In experimental line crosses, most parametric methods for mapping quantitative trait locus (QTL) fall into one of three types of approaches, least-squares, maximum likelihood or Bayesian approach. A common characteristic of these methods is that they all assume normally distributed phenotypes. However, many traits do not follow normal distributions, this may arise by non-normal traits, such survival time, and others may be the result of human measurement error. This deviation from the normality assumption by phenotypes can render many QTL mapping approaches inappropriate, in senses of less accuracy and effectiveness in QTL detection (Coppieters et al., 1998), and unstable results due to outliers (Pinheiro et al., 2001). To improve the robustness, various approaches have been developed to deal with non-normal phenotypes in QTL mapping. A simple approach is to adopt parametric methods known for their robustness. However, their robustness for non-normal phenotypes is difficult to establish (e.g. Coppieters et al., 1998; Hackett, 1997; Jansen, 1992; Rebaï, 1997). A second approach is to convert non-normal traits into approximately normal variables through mathematical transformation (Sokal and Rohlf, 1995; Yang et al., 2006). Distribution-free non-parametric methods were also developed for mapping non-normal traits for various population structures (Coppieters et al., 1998; Elsen et al., 1999; Kruglyak and Lander, 1995; Zou et al., 2003). Yet another approach is to replace the normal assumption about the data with other distributions to better fit the trait data (Diao et al., 2004; Feenstra and Skovgaard, 2004; Jansen, 1992; Symons et al., 2002). When the data is non-normal, assuming that the distributions of random effects and of residuals of Gaussian distributions makes inferences vulnerable to the presence of outliers (Pinheiro et al., 2001). To accommodate these outliers, some symmetric and long-tailed distributions, such as the Student's-t distribution (Dempster et al., 1980; Lange et al., 1989; Rogers and Tukey, 1972), have been suggested for robust estimation. The normal/independent distributions (Andrews and Mallows, 1974; Lange and Sinsheimer, 1993) are a class of symmetric and long-tailed distributions and are used in linear regression models, within a Bayesian framework (Liu, 1996). Fernandez and Steel (1998) applied the method of inverse scaling of the probability density function on the left and on the right side of a non-normal distribution to a symmetric heavy-tailed distribution and have simultaneously captured heavy tails and skewness. Rohr and Hoeschele (2002) have incorporated the Fernandez and Steel's approach into a Bayesian QTL mapping, developing a robust Bayesian QTL mapping method, which allows for non-normal, continuous distributions of phenotypes within QTL genotypes in single QTL models. The genetic architecture of quantitative traits includes the number and locations of QTL and their main and epistatic effects. In particular, the unknown number of QTL and possible huge epistatic effects make the dissection for genetic architecture of quantitative traits extremely complex. Fortunately, with a computationally efficient Markov Chain Monte Carlo (MCMC) algorithm, Bayesian model selection frameworks have been developed for identifying epistatic QTL for complex traits (Yi et al., 2005, 2007). However, normal distributions were assumed for these approaches. The effects of deviation from this assumption have not been fully addressed. In this article, we developed a Bayesian robust analysis strategy for studying the genetic architecture of quantitative trait, by combining the flexibility of Bayesian approach in modeling multiple QTL and their interactions and the better phenotypic fitting of symmetric and long-tailed distributions in characterizing non-normal traits. We investigated the robustness of the proposed method by a series of simulations, and applied it to a real dataset in rice. Our method showed an improved power in mapping QTLs with non-normal phenotypes.

2 METHOD

2.1 Genetic model

For simplicity, we consider a mapping population with only two segregating genotypes, e.g., a backcross, double haploid lines (DHLs) or recombinant inbred lines (RILs). However, the method can be applied to other experimental designs, such as F2 design. The phenotypes and molecular marker data were collected on n individuals. Assuming that there are q QTLs responsible for a trait of interest, the phenotypic value y of individual i can be then described by the following multiple interacting QTL model: where μ is the population mean; α for j=1, 2,…q is the additive effect of the j-th QTL; δ is the epistatic effect between j-th QTL and k-th QTL for j=1, 2,…, q; k=j+1, j+2,…, q. Variable x is a genotype indicator variable for individual i at locus j and is defined as 1 for one genotype and −1 for the other genotype, and z=xx; γ• is a binary variable for each genetic effect (additive or epistatic), indicating whether the corresponding effect is included (γ•=1) or excluded (γ•=0) from model (1). Through inferring the γ•, we shall adopt Bayesian model selection to MCMC sampling in a reduced model space; and ɛ is a random environmental error. To cover outliers from non-normal distributed phenotypes, we introduce the normal/independent distributions to describe random environmental errors, denoted by , where e∼N(0, σ2) and w is a positive random variable with density p(w|df.) with df being a scalar parameter. The type of normal/independent distributions depends on the distribution of w. For instance, if w is taken to be Gamma(df/2, df/2), the normal/independent distribution becomes a t-distribution; p(w|df)=dfw results in a slash distribution (Lange and Sinsheimer, 1993; Rogers and Tukey, 1972); and the contaminated normal distribution arises when with 0≤v<1 and 0<τ<1, where vand τ are scalar parameters (Little, 1988). These three distributions are the most common long-tailed distributions for robust inference. Apparently, the normal model is a special case by taking w=1, for all i.

2.2 Likelihood function

The probability distribution of the phenotype data conditional on all parameters is called the likelihood. Based on model (1), the conditional density of all phenotypes, given the parameters, is Where y={y}, λ={λ},β={α δ}, γ={γγ} and w={w}, for i=1, 2,…, n, j=1, 2,…, q and k=j+1, j+2,…, q.

2.3 Prior distribution

As described by Yi et al. (2005), we take L, the maximal number of QTLs as , where l0 is the prior expected number of all QTLs including main-effect and epistatic QTLs that is determined based on traditional methods. The binary indicator γ has an independent prior p(γ)=∏p•γ(1−p•)(1−γ, where p•is the prior inclusion probability for a certain QTL effect and equals to a predetermined hyper-parameter p for main effect or p for epistatic effect. The population mean μ is assumed to have a prior p(μ)∝ constant. A hierarchical mixture model is proposed as the prior distribution for each genetic effect, denoted by α|(γ, σ2, x•)∼N(0, γc(∑wx2)−1σ2) for additive effects and δ|(γ, σ2, z•)∼N(0, γc(∑wz2)−1σ2) for the epistatic effects, where c takes a value such that the prior variance of each QTL effect stays approximately the same as n increases. Here, we let c=n. A scaled inverse-χ2 distribution with hyper-parameters v and s will be adopted as prior for σ2, i.e. The prior for scalar parameter df is specified based on the type of normal/independent distributions for residual error. The detailed specification of the prior is given in Appendix A. The prior for position of the j-th QTL is p(λ)=1/d, where d is the length of the marker or adjoining QTLs interval where the j-th QTL resides.

2.4 Posterior distribution and MCMC sampling

The joint posterior density of all unknown parameters is then: where m is the known marker information; forj=1, 2,…, q. In order to implement Bayesian estimation via the MCMC, the marginal posterior distributions of all parameters need to be derived from the above joint posterior density (2) by fixing other parameters. For convenience, we first let The fully conditional posterior density of the population mean μ, given all other parameters, can be shown to be a normal distribution with mean , and variance. Conditionally on all other parameters, the QTL effects are mutually independent. In particular, the density of the fully conditional posterior distribution of α is normal with mean , and variance . Likewise, the conditional posterior distribution of δ corresponds to the normal with mean and variance , for j=1, 2,…, q and k=j+1, j+2,…, q. For the residual variance σ2, the corresponding fully conditional distribution is a scaled inverse χ2 with parameters v+n and . So far, we note that w can be interpreted as a ‘weight’. The specific forms of the posterior for w depend on the normal/independent distribution adopted, and the posterior for degree of freedom df depend on the form of corresponding prior distribution (detailed in Appendix B). The marginal posterior distribution of γ• is Bernoulli with a probability where, p•=p and for the additive effect; p•=p and for the epistatic effect. If γ• is sampled to be zero, corresponding α or δ=0. Otherwise, α or δ is drawn from its conditional posterior. Only the position of QTL, where corresponding γ•=1 for either main or epistatic effect, will be sampled. Since the genotype of QTL (x) depends on the QTL position (λ), we decide to sample {λ, x} jointly as a block but proceed with the sampling wit one locus at a time. Each locus is sampled from a variable interval (Wang et al., 2005; Zhang and Xu, 2005) whose boundaries are the positions of adjoining QTLs. The prior distribution of λ can be written as where λ and λ are the positions of the left and the right QTL. Let λ( be the current position of the locus of interest and x(=[x1 … x] be the genotype array of all individuals at the locus. We first sample a new position for the QTL called the proposed position and denoted by λ*=λ+δ, where δ is sampled from U(−s, s) and s is a small positive number (tuning parameter), such as 1 cM. For the new position, we simulate the genotypes for all individuals, denoted by x*. We then use the M–H rule to decide whether λ* should be accepted or not. If λ* is accepted, we update both the position and the genotype using λ(=λ* and x(=x*. Otherwise, the old values of λ and x are carried over so that λ(=λ( and x(=x(. Detailed formula of the M–H acceptance rule can be found in (Wang et al., 2005) and Zhang and Xu (2005). Genotypes of missing markers were generated randomly in each iteration on the basis of the probability inferred jointly from the nearest non-missing flanking markers and the phenotype. The probability from the markers is treated as the prior probability. After incorporation of the marker (QTL) effects through the phenotype, the probability becomes the posterior probability, which is used to generate the missing marker genotype. See, (Wang et al., 2005) for details. In summary, the MCMC process is described in the following steps: Initialize all variables with some legal values or values sampled from their prior distributions. Update population mean μ. Update the binary indicators γ. Update the additive QTL effects α corresponding that γ=1. Update the epistatic QTL effects δ corresponding that γ=1. Update the residual variance σ2. Update the degree of freedom df in the t-distribution or Slash distribution, or v in the contaminated normal distribution. Update the ‘weight’ w (i=1, 2,…, n). Update the QTL position λ corresponding that γ•=1 and the genotypes for those QTLs. Impute the genotypes of missing markers. Repeat steps (2)–(10) until the Markov chain reaches a desirable length.

2.5 Post-MCMC analysis

The posterior sample can be used to infer the genetic architecture of a quantitative trait. Prior to doing this, we need to monitor the mixing behavior and convergence rates of MCMC algorithms by visually inspecting trace plots of the sample values of scalar quantities of interest or by using formal diagnostic methods provided in the package R/coda (Plummer et al., 2004). Model averaging accounts for model uncertainty and provides more robust inference compared with a single optimal model approach (Ball, 2001; Raftery et al., 1997; Sillanpää and Corander, 2002). Therefore, we employ the model averaging to assess characteristics of the genetic architecture by averaging over possible models weighted by their posterior probabilities. We can use various methods to graphically and numerically summarize and interpret the posterior samples. The posterior inclusion probability for each locus is estimated as its frequency in the posterior samples. Taking the prior probability into consideration, we use Bayes factors (BFs) to show evidence for inclusion against exclusion of each locus or effect. The BF for a locus or effect is defined as the ratio of the posterior odds to the prior odds for inclusion against exclusion of the locus or effect within each chromosomal interval of 1–2 cM (Kass and Raftery, 1995). Generally, a threshold of BF is empirically determined as 3, or 2logBF=2.1, for declaring statistical significance for each locus or effect (Kass and Raftery, 1995).

3 SIMULATION STUDIES

For convenience of programming, we simulated 61 equally spaced co-dominant markers on a single large chromosome of a length 500 cM for a backcross population with sample sizes of 150 and 300. We simulated the four QTLs, two pairs of which are assumed to mutually interact. The total genetic variance contributed by all main-effect and epistatic QTLs was 45.06, where the proportion of phenotypic variance contributed by an individual QTL ranged from 0.95% to 11.63%. The population mean and the residual variance were set at μ=5.0 and σ2=3.0. We use non-Bayesian and Bayesian methods to analyze the simulated data. Non-Bayesian mapping is implemented with EM algorithm through two dimensional scan. Detected QTL effects are estimated using multiple QTL imputation. The critical values at significance level of 5% are 3.9 for main effect and 6.7 for epistatic effect, which are obtained with 1000 permutations. In all Bayesian mapping analysis, we set the prior number of main-effect QTL at three and the prior expected number of epistatic QTL at three, then the upper bound of the number of QTL, . The actual values for the hyper-parameters take v=0 and s=1; a=1 and b=0.01. The initial values of all variables are sampled from their prior distributions. The MCMC is run for 6000 cycles as burn-in period (deleted) and then for additional 100 000 cycles after the burn-in. Note that here the length of the burn-in is judged by visually inspecting the plots of some posterior samples across rounds. The chain is then thinned to reduce serial correlation by saving one observation in every 40 cycles. The posterior sample contains 2500 observations for the post-MCMC analysis. Considering each simulation is more time consuming, the simulation experiment was replicated 50 times for statistical power evaluation. In order to demonstrate the flexibility of the Bayesian robust mapping proposed here, we use residual errors drawn from t-distribution with df=3 to generate the two samples of different size, according to model (1). Those data were analyzed by adopting the Bayesian robust mapping with a t-distribution, slash distribution and contaminated normal distribution for residuals, traditional Bayesian and non-Bayesian mapping procedures with normal residuals, respectively. The statistical powers of all the methods for QTL detection are given in Table 1. In general, Bayesian robust mapping has higher statistical powers for QTL detection than traditional Bayesian and non-Bayesian mapping if the residual error is subject to heavy-tailed distribution. The estimates for positions and effects of QTL detected by all methods are fairly close to true parameter values. As expected, the model is more robust with increased heritability and sample size (Tables 2 and 3). Statistical power of QTL detection increases as sample size and genetic contribution proportion increase. The type I error rates of all methods are <6%. On the whole, as statistical power rises, error rate falls.

Table 1.

Statistical power of QTL detection (%) and type I error rate (%, in the last column) obtained by various mapping methods

Sample size	Distribution	QTL no.
		1	2	3	4	5	6
150	t	70	100	48	92	56	36	2
	Slash	62	92	26	90	20	20	4
	Contaminated	60	80	30	84	20	16	4
	Normal	36	74	8	80	6	2	6
	Non-Bayesian	16	28	0	32	4	0	6
300	t	100	100	82	100	84	64	0
	Slash	96	100	74	100	84	54	2
	Contaminated	76	100	42	100	36	34	2
	Normal	50	90	36	80	20	30	4
	Non-Bayesian	44	70	30	78	20	18	4

Table 2.

Mean estimates and SDs (in parentheses) of QTL positions detected by various mapping methods

Sample size	Distribution	QTL no.
		1	2	3	4	5	6
150	True position	56	148	267	359	56×267	148×359
	t	55.3 (5.1)	148.9 (2.4)	268.2 (5.6)	358.9 (3.5)	57.8 (11.0)×267.9 (8.8)	151.3 (7.7)×356.9 (6.3)
	Slash	54.2 (4.8)	148.4 (3.4)	268.4 (3.0)	358.7 (4.9)	58.1 (8.9)×265.7 (9.2)	150.1 (7.0)×358.2 (7.7)
	Contaminated	56.2 (5.9)	147.9 (4.3)	269.0 (7.5)	359.8 (3.9)	57 (13.3)×263.8 (12.7)	148.0 (6.8)×360.9 (9.2)
	Normal	52.6 (4.2)	148.1 (4.9)	258.0 (9.8)	359.4 (3.6)	56.1 (13.0)×264.6 (15.2)	143.0 (–)×360.0 (–))
	Non-Bayesian	55.7 (6.9)	150.2 (5.4)	–	361.3 (5.8)	61.2 (15.1)×268.6 (18.4)	–
300	t	57.6 (2.9)	148.3 (3.1)	266.4 (3.5)	357.5 (2.7)	58.4 (5.3)×265.4 (7.8)	149.8 (4.5)×359.3 (3.9)
	Slash	55.9 (3.1)	149.4 (2.5)	266.3 (4.6)	357.9 (2.4)	57.4 (3.8)×266.2 (7.3)	150.6 (4.8)×359.0 (5.1)
	Contaminated	56.0 (3.5)	146.4 (2.9)	264.3 (3.5)	357.8 (3.0)	57.7 (8.8)×269.0 (9.9)	149.0 (3.5)×359.2 (5.4)
	Normal	57.4 (3.9)	147.9 (2.4)	264.0 (6.1)	359.4 (3.2)	52.4 (10.1)×270.5 (10.5)	145.0 (8.0)×358.4 (8.1)
	Non-Bayesian	57.1 (4.1)	149.5 (3.3)	266.1 (7.3)	359.0 (3.4)	54.4 (13.6)×268.2 (10.1)	151.7 (9.1)×360.8 (7.4)

Table 3.

Mean estimates and SDs (in parentheses) of QTL effects detected by various mapping methods

Sample size	Distribution	QTL no.
		1	2	3	4	5	6
150	True Effect	0.45	0.70	0.30	0.55	0.30	0.20
	t	0.50 (0.09)	0.73 (0.10)	0.35 (0.06)	0.57 (0.14)	0.25 (0.09)	0.23 (0.10)
	Slash	0.51 (0.10)	0.77 (0.13)	0.38 (0.04)	0.54 (0.09)	0.23 (0.10)	0.27 (0.13)
	Contaminated	0.51 (0.12)	0.76 (0.14)	0.39 (0.17)	0.62 (0.10)	0.37 (0.12)	0.26 (0.14)
	Normal	0.56 (0.20)	0.74 (0.22)	0.46 (0.29)	0.63 (0.14)	0.39 (0.20)	0.31 (–)
	Non-Bayesian	0.81 (0.52)	1.04 (0.44)	–	0.87 (0.43)	0.68 (0.48)	–
300	t	0.46 (0.07)	0.70 (0.08)	0.33 (0.13)	0.57 (0.08)	0.26 (0.07)	0.23 (0.08)
	Slash	0.45 (0.09)	0.72 (0.09)	0.35 (0.07)	0.56 (0.08)	0.25 (0.09)	0.25 (0.09)
	Contaminated	0.45 (0.09)	0.70 (0.12)	0.39 (0.18)	0.60 (0.14)	0.35 (0.09)	0.25 (0.12)
	Normal	0.52 (0.19)	0.72 (0.14)	0.41 (0.28)	0.61 (0.18)	0.36 (0.18)	0.28 (0.17)
	Non-Bayesian	0.78 (0.41)	0.89 (0.30)	0.58 (0.38)	0.83 (0.35)	0.64 (0.42)	0.51 (0.29)

Statistical power of QTL detection (%) and type I error rate (%, in the last column) obtained by various mapping methods Mean estimates and SDs (in parentheses) of QTL positions detected by various mapping methods Mean estimates and SDs (in parentheses) of QTL effects detected by various mapping methods We further generated normally distributed phenotypes by sampling residuals from normal distribution and analyzed them with both the Bayesian robust mapping and traditional Bayesian mapping. Results (provided in Section 1 of Supplementary Material) indicated that applying the Bayesian Robust analysis for data being normally distributed had similar powers as using traditional Bayesian mapping methods.

4 REAL DATA ANALYSIS

A 162 F10 RILs derived from the hybrids of Dasanbyeo (a Korean tongil type rice) × TR22183 (a Chinese japonica variety) had been designed for mapping QTL for traits associated with physical/chemical characteristics and quality of rice. On the basis of the population, we have constructed the framework linkage map of 1437.5 cM long using 208 SSR and STS markers. This map consists of the 16 linkage groups (LGs) for each parental map. We analyzed the data with the Bayesian robust mapping with different type of distributions and traditional Bayesian mapping procedure with normal residuals, respectively. In all Bayesian analyses, based on results from the interval non-epistatic mapping (Lander and Botstein, 1989) and two-dimensional genome scan, the prior number of main-effect QTL was set at l=3 and the prior expected number of all QTL (l0) was taken to be l+5. The upper bound of the number of QTL, L, was then 16. The initial value of each unknown parameter took the same one as in simulation study. The MCMC was run for 200 000 cycles after the burn-in of 6000 cycles. It was found that the mapping results from 13 of 21 traits of interest support the Bayesian robust mapping procedure. Herein, we take the peak viscosity (PKV) as an example trait to compare the mapping results based on different residual distributions. The estimates for positions and genetics effects of QTL detected with the Bayesian robust mapping and the traditional bayesian mapping method are listed in Tables 4 and 5, respectively. Apparently, the results from different distributions are comparable: three main-effect QTLs and seven pairs of epistatic QTLs, covering all QTL detected by other methods, are identified with Bayesian robust mapping with a t-distribution, and followed by one main-effect QTL and four pairs of epistatic QTLs with slash distribution for residuals, one main-effect QTL and three with contaminated normal distribution for residuals and one main-effect QTL and two pairs of epistatic QTL with normal distribution for residuals, whereas only one main effect QTL on seventh LG with non-Bayesian method. This implies that Bayesian robust analysis has higher power than traditional Bayesian model selection and non-Bayesian method. Most of the main-effect and epistatic QTLs increase the PKV in rice, except for a third main-effect QTL and ninth pair of QTLs. All three different cases of two QTLs that involve the epistatic effects are found: (1) both QTLs are the main, as fourth and eighth pairs of QTL; (2) both QTLs are not the main, as seventh pair of QTL and the rest are that only one QTL is the main. Figures 1 and 2 (in Section 2 of supplementary data) plot the one-dimensional profiles of BFs for main effects and two-dimensional profiles of BFs for epistatic effects obtained from Bayesian robust mapping with a t-distribution for residuals, respectively. They intuitively illustrate the results from Bayesian robust analysis for genetic architecture of quantitative traits.

Table 4.

Estimated QTL positions (LG-position) obtained from Bayesian robust mapping with different distribution for residual on PKV in rice

QTL no.	Distribution
	t	Slash	Contaminated	Normal
1	1-438.7	–	–	–
2	7-327.6	7-320.9	7-326.2	7-322.7
3	16-164.5	–	–	–
4	(1-435.9)×(16-162.8)	(1-440.8)×16-183.6)	(1-439.2)×(16-175.2)	–
5	(1-309.4)×(12-11.5)	(1-302.1)×(12-13.2)	–	–
6	(1-443.2)×(6-23.8)	(1-447.5)×(6-33.2)	(1-436.2)×(6-32.6)	(1-450.8)×(6-30.7)
7	(1-65.6)×(1-253.2)	–	–	–
8	(7-327.6)×(16-164.5)	–	–	–
9	(4-24.8)×(16-160.8)	–	(4-28.3)×(16-162.1)	–
10	(9-27.3)×(16-168.7)	(9–25.9)×(16-175.1)	–	(9–28.4)×(16-162.1)

Table 5.

Estimated QTL effects obtained from Bayesian robust mapping with different distribution for residual on PKV in rice

QTL no.	QTL type	Distribution
		t	Slash	Contaminated	Normal
1	Main Effect	0.46(1.96)	–	–	–
2	Main Effect	10.05(5.65)	9.54(4.38)	9.82(5.13)	9.61(2.46)
3	Main Effect	−4.77(2.77)	–	–	–
4	Epistatic	13.46(9.03)	13.98(7.56)	12.95(8.78)	–
5	Epistatic	9.00(5.13)	10.36(6.13)	–	–
6	Epistatic	7.07(4.29)	7.45(5.82)	7.56(5.13)	7.31(4.69)
7	Epistatic	8.06(3.17)	–	–	–
8	Epistatic	2.73(3.45)	–	–	–
9	Epistatic	−5.46(3.18)	–	−4.98(3.89)	–
10	Epistatic	3.04(2.55)	3.95(2.41)	–	2.59(4.02)

The numbers in parentheses are the 2logBF values.

Estimated QTL positions (LG-position) obtained from Bayesian robust mapping with different distribution for residual on PKV in rice Estimated QTL effects obtained from Bayesian robust mapping with different distribution for residual on PKV in rice The numbers in parentheses are the 2logBF values.

5 DISCUSSION

Within the framework of Bayesian model selection for mapping genome-wide interacting QTLs, we develop a Bayesian robust mapping strategy for analyzing continuous non-normal quantitative traits, by replacing the normal distribution for residuals in multiple QTL model with the normal/independent distributions. Compared with Bayesian mapping for normal data, the Bayesian robust mapping strategy additionally sample ‘weight’ W and the robustness parameter df with the Gibbs sampler or Metropolis/Hastings algorithm in the MCMC process. Although computations for the robust models may be more than for their normal counterparts, the flexibility of the Bayesian robust mapping for either non-normal or normal data is enough to compensate for the cost. Of course, if the robustness parameter is assumed to be known, e.g. simply fixed at a small value (Gelman et al., 1995), the implementation of the Bayesian robust mapping will be even easier. In practice, however, unless there is a strong reason to believe in the adequacy of the normality assumption for residuals, it may be safer to use a robust model (Rosa et al., 2003, 2004). Except for the three common normal/independent distributions discussed in this study, other distributions can also be considered, such as the Laplace and the double exponential distributions. Which distribution is optimal for fitting residuals depends on peculiarities of the dataset, such as the proportion of outliers and how far these are from the ‘center’ of the distribution. The t-distribution is the most commonly used thick-tailed distribution for robust inference, and is often a good alternative to a normal distribution. The contaminated normal distribution is the most flexible among the three robust distributions, but at the expense of an additional parameter. The slash distribution, although not often encountered in the literature, is the easiest one to implement in hierarchical modeling, because all conditional posterior distributions have closed forms. Rohr and Hoeschele (2000) first implemented a robust Bayesian method for mapping QTL. Their study was different from ours in that: (1) their mapping analysis aimed at outbred population whereas ours at linecross; (2) their proposed method was based on a single QTL model whereas ours was based on a multiple QTL model; and (3) they used skewed Student's t-distributions to describe phenotypic residuals in the analysis whereas we adopted a student's t-distribution. In the single QTL model, it may be reasonable to assume that residuals follow skewed Student's t-distributions, because the ‘skewness’ may absorb the effects of other QTLs on phenotypes. However, no ‘skewness’ is necessary for the multiple QTL model. A complete Bayesian mapping requires the sampling of genotypes for QTL and missing markers and aggravates the computational cost of Bayesian robust analyses. To alleviate this problem, we evenly partition the entire genome into small intervals by a number of points and restrict putative QTL to these fixed points, as proposed by (Yi et al., 2005). This strategy greatly reduces computational time by estimating all expected values of indicator variables for putative QTL by using conditional probability of their genotypes on two flanking markers before the MCMC procedure starts. Other ways to improve the efficiency of analyzing many QTL effects with Bayesian model selection include specifying prior inclusion probability for epistasis and using Metropolis/Hastings algorithm to perform fast sampling for binary indicator (Yi et al., 2007). Funding: National Institutes of Health (R01 AR050496-01, R21 AG027110, R01 AG026564, and P50 AR055081, partial); Dickson/Missouri Endowment (to H.W.D.) the Chinese National Natural Science Foundation (Grant 30471236 to R.Y.) Conflict of Interest: none declared.

15 in total

1. Bayesian methods for quantitative trait loci mapping based on model selection: approximate analysis using the Bayesian information criterion.

Authors: R D Ball
Journal: Genetics Date: 2001-11 Impact factor: 4.562

Bayesian robust analysis for genetic architecture of quantitative traits.

1 INTRODUCTION

2 METHOD

2.1 Genetic model

2.2 Likelihood function

2.3 Prior distribution

2.4 Posterior distribution and MCMC sampling

2.5 Post-MCMC analysis

3 SIMULATION STUDIES

4 REAL DATA ANALYSIS

5 DISCUSSION

1. Bayesian methods for quantitative trait loci mapping based on model selection: approximate analysis using the Bayesian information criterion.

2. Model choice in gene mapping: what and why.

3. A quantitative trait locus mixture model that avoids spurious LOD score peaks.

4. Mapping quantitative trait loci with censored observations.

5. Bayesian shrinkage estimation of quantitative trait loci parameters.

6. Bayesian mapping of genomewide interacting quantitative trait loci for ordinal traits.

7. A general mixture model for mapping quantitative trait loci by using molecular markers.

8. Mapping mendelian factors underlying quantitative traits using RFLP linkage maps.

9. A nonparametric approach for mapping quantitative trait loci.

10. Multiple genetic loci modify susceptibility to plasmacytoma-related morbidity in E(mu)-v-abl transgenic mice.

1. A robust multiple-locus method for quantitative trait locus analysis of non-normally distributed multiple traits.

2. Strong selection genome-wide enhances fitness trade-offs across environments and episodes of selection.

3. Molecular pathways underpinning ethanol-induced neurodegeneration.