Literature DB >> 35135468

Comparison of linkage disequilibrium estimated from genotypes versus haplotypes for crossbred populations.

Setegn Worku Alemu1, Piter Bijma2, Mario P L Calus2, Huiming Liu3, Rohan L Fernando4, Jack C M Dekkers4.   

Abstract

BACKGROUND: Linkage disequilibrium (LD) is commonly measured based on the squared coefficient of correlation [Formula: see text] between the alleles at two loci that are carried by haplotypes. LD can also be estimated as the [Formula: see text] between unphased genotype dosage at two loci when the allele frequencies and inbreeding coefficients at both loci are identical for the parental lines. Here, we investigated whether [Formula: see text] for a crossbred population (F1) can be estimated using genotype data. The parental lines of the crossbred (F1) can be purebred or crossbred.
METHODS: We approached this by first showing that inbreeding coefficients for an F1 crossbred population are negative, and typically differ in size between loci. Then, we proved that the expected [Formula: see text] computed from unphased genotype data is expected to be identical to the [Formula: see text] computed from haplotype data for an F1 crossbred population, regardless of the inbreeding coefficients at the two loci. Finally, we investigated the bias and precision of the [Formula: see text] estimated using unphased genotype versus haplotype data in stochastic simulation.
RESULTS: Our findings show that estimates of [Formula: see text] based on haplotype and unphased genotype data are both unbiased for different combinations of allele frequencies, sample sizes (900, 1800, and 2700), and levels of LD. In general, for any allele frequency combination and [Formula: see text] value scenarios considered, and for both methods to estimate [Formula: see text], the precision of the estimates increased, and the bias of the estimates decreased as sample size increased, indicating that both estimators are consistent. For a given scenario, the [Formula: see text] estimates using haplotype data were more precise and less biased using haplotype data than using unphased genotype data. As sample size increased, the difference in precision and biasedness between the [Formula: see text] estimates using haplotype data and unphased genotype data decreased.
CONCLUSIONS: Our theoretical derivations showed that estimates of LD between loci based on unphased genotypes and haplotypes in F1 crossbreds have identical expectations. Based on our simulation results, we conclude that the LD for an F1 crossbred population can be accurately estimated from unphased genotype data. The results also apply for other crosses (F2, F3, Fn, BC1, BC2, and BCn), as long as (selected) individuals from the two parental lines mate randomly.
© 2022. The Author(s).

Entities:  

Mesh:

Year:  2022        PMID: 35135468      PMCID: PMC8822837          DOI: 10.1186/s12711-022-00703-z

Source DB:  PubMed          Journal:  Genet Sel Evol        ISSN: 0999-193X            Impact factor:   4.297


Background

Linkage disequilibrium (LD) is the non-random association of alleles at different loci within haplotypes. LD plays an important role in both population and quantitative genetics. In population genetics, LD can for example be used to detect selection [1]. In quantitative genetics, LD has been used to map quantitative trait loci [1-3] and for marker-assisted selection [4] and genomic selection [5]. Thus, knowledge of LD is required for diverse applications in genetics. LD is traditionally measured based on the comparison of the observed haplotype frequencies with the expected haplotype frequencies under linkage equilibrium. A common statistical measure of LD is the co-variance between loci, , which is equal to the excess of coupling phase haplotypes, , where refers to the frequency of gametes (haplotypes) that carry the pair of alleles and at the two loci, and refer to the frequency at locus and locus , respectively, and is the expected frequency of this haplotype under linkage equilibrium [6]. Another common measure is the squared coefficient of correlation () between the alleles at the two loci within haplotypes, = [7]. To calculate and using the expressions given above, the haplotypes carried by the individuals must be known. However, Rogers and Huff [8] showed that LD can also be estimated by correlating unphased genotype dosages at the two loci, which makes the computation simple and fast. They demonstrated that LD estimated from unphased genotypes yields very similar results to LD estimated from haplotypes. In their derivation, however, they assumed equal inbreeding coefficients for the two loci and equal allele frequencies for the paternal and maternal gametes that created the population. In this context, the inbreeding coefficient measures the departure from Hardy–Weinberg equilibrium and, thus, can take positive or negative values. However, for crossbred individuals inbreeding coefficients can differ between the two loci, and paternal and maternal allele frequencies can differ because the two parents come from different lines. Here, we investigated whether LD in crossbred populations can be estimated using unphased genotype data. We assumed that sires and dams of the crossbreds originate from two distinct lines but are otherwise mated to each other at random. We address this question in three steps. First, we derive the inbreeding coefficients of crossbreds, showing that they take negative values that typically differ between loci. As a result, the derivation of Rogers and Huff [8] cannot be used to demonstrate the equivalence of genotype-based LD to haplotype-based LD for a crossbred population. Second, we show theoretically that LD computed from genotype frequencies has the same expected value for a given dataset as LD computed from haplotype frequencies, even for a crossbred population. Finally, we investigate the precision and potential bias of LD estimated from unphased genotype data versus haplotype data, using stochastic simulation.

Methods

Inbreeding coefficients for a crossbred population

Consider two outbred lines, and . We want to investigate the inbreeding coefficients for two bi-allelic loci, and , in the F1 crossbred offspring that result from the crossing of random individuals from two parental lines. With alleles denoted 0 and 1, is the frequency of allele 1 at locus in line , and is the frequency of allele 1 at locus in line . The expected frequency of allele 1 at locus in the crossbreds then is With random mating between individuals from the two parental lines, the frequency of genotype 11 in the crossbreds is The deviation of this frequency from Hardy–Weinberg equilibrium follows from [6, 9]. where is the inbreeding coefficient at locus in the crossbreds. The inbreeding coefficient follows from solving this expression for , substituting , and simplifying the expression, giving: Similarly, Note that the numerators of and are always negative, except when and , while the denominators are always positive. This shows that the inbreeding coefficients of crossbreds are negative, meaning that heterozygosity is greater than would be expected under Hardy–Weinberg equilibrium (for example , , and yields and ). We investigated under which conditions the inbreeding coefficients at the two loci are equal by solving the expression for the allele frequencies, using Wolfram Mathematica (www.wolfram.com). Apart from the trivial solutions of = 0, = 1, and equal allele frequencies at both loci, we found only three solutions (see Appendix 1). Hence, this result demonstrates that the inbreeding coefficients at two arbitrary loci in a crossbred population will usually be different. This implies that the derivation of Rogers and Huff [8] cannot be used to demonstrate the equivalence of genotype-based LD to haplotype-based LD for a crossbred population.

Haplotype-based linkage disequilibrium

In this section, we show that the expected LD based on computed from the genotype frequencies of the crossbred population is identical to the true based on haplotype frequencies, even when the inbreeding coefficients differ between the two loci. Note that we consider the true (i.e., population) value of here, rather than an estimate from a sample. As we consider bi-allelic loci, we have four haplotype frequencies for each line, denoted , , , and for line , and using to refer to frequencies for line , we have haplotype frequencies , , , and for line . Table 1 shows expressions for the marginal frequency for each of the alleles. Although the expressions for the marginal frequencies in Table 1 can be simplified by formulating them in terms of allele frequencies, we stick to the haplotype frequencies to facilitate comparison with results for the genotype-based .
Table 1

Haplotype frequencies and marginal allele frequencies for line a

Alleles at locus MAlleles at locus N
01Marginal frequency
0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r$$\end{document}r\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s$$\end{document}s\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r+s$$\end{document}r+s
1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t$$\end{document}t\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u$$\end{document}u\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t+u$$\end{document}t+u
Marginal frequency\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r+t$$\end{document}r+t\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s+u$$\end{document}s+u\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r+s+t+u=1$$\end{document}r+s+t+u=1

aCorresponding symbols for line are denoted by

Haplotype frequencies and marginal allele frequencies for line a aCorresponding symbols for line are denoted by Crossbred genotypes consist of two sets of haplotypes, one from each parental line, which may have a different . By definition, the in the crossbreds depends on the (co)variances between loci in the crossbred population, so we cannot simply average the of the two parental lines. From the definitions of correlation, variance, and covariance, it follows that the for the crossbred population equals the square of the average of the covariances between haplotypes for each of the two lines, divided by the product of the average variance across the two lines at each locus. For line , the covariance between haplotypes (i.e. ) follows from Table 1 as , where is the expectation of the cross product of the allele frequencies at each locus, while is the cross product of the expectations of these allele frequencies (expected haplotype frequency in line under linkage equilibrum). Hence, this result follows immediately from the definition of a covariance. The covariance () for line is analogous, using symbols denoted by . The variance in allele count follows from the binomial distribution with n = 1 for haplotypes and are thus equal to , denoting the allele frequency. For line the variance equals for locus , and for locus , with analogous equations for line . Using these values in the haplotype-based for the crossbred population yields the following true in the crossbred population: where the numerator is the square of the average of the covariances for the two parental lines, while the denominator is the product of the average of the variances. Note that the constant 22 in the numerator of Eq. (1) and 22 in the demoninator of Eq. (1) (2 for each variance) cancelled out in the derivation of the equation.

Genotype-based squared correlation

The following inputs are required to derive the genotype-based in crossbreds: genotype frequencies and the expectations of squares and cross products of genotype dosage, 0, 1, and 2, in crossbreds. Using the haplotype frequencies in Table 1 and the assumption that individuals of line mate at random to individuals of line , we find the genotype frequencies in the crossbred population as shown in Table 2. Next, using these genotype frequencies, Table 3 shows the expectations of squares and cross products of genotype dosages. Computations of the expectations of combinations of genotypic values are in Appendix 1.
Table 2

Expected genotype frequencies in the crossbred offspring when individuals from lines and are mated at random to each other

Line \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${A}$$\end{document}A haplotypeLine \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${B}$$\end{document}B haplotype
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$00\,{r}{^{\prime}}$$\end{document}00r01 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${s}{^{\prime}}$$\end{document}s\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10\,{t}{^{\prime}}$$\end{document}10t\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$11\,{u}{^{\prime}}$$\end{document}11u
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$00$$\end{document}00 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r$$\end{document}ra\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{00}{00}{{r}^{\prime}r}^{\mathrm{b}}$$\end{document}0000rrb\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{00}{01}s{^{\prime}}r$$\end{document}0001sr\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{00}{10}{t}{^{\prime}}r$$\end{document}0010tr\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{00}{11}{u}{^{\prime}}r$$\end{document}0011ur
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$01\,s$$\end{document}01s\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{01}{00}r{^{\prime}}s$$\end{document}0100rs\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{01}{01}s{^{\prime}}s$$\end{document}0101ss\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{01}{10}{t}{^{\prime}}s$$\end{document}0110ts\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{01}{11}{u}{^{\prime}}s$$\end{document}0111us
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10\,t$$\end{document}10t\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{10}{00}r{^{\prime}}t$$\end{document}1000rt\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{10}{01}s{^{\prime}}t$$\end{document}1001st\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{10}{10}{t}{^{\prime}}t$$\end{document}1010tt\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{10}{11}u{^{\prime}}t$$\end{document}1011ut
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$11\,u$$\end{document}11u\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{11}{00}r{^{\prime}}u$$\end{document}1100ru\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{11}{01}s{^{\prime}}u$$\end{document}1101su\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{11}{10}{t}{^{\prime}}u$$\end{document}1110tu\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{11}{11}u{^{\prime}}u$$\end{document}1111uu

aMarginal frequencies are haplotype frequencies

bJoint frequencies are the genotype frequencies

Table 3

Unordered genotypes, their genotype dosages, frequencies, and expectations of genotype dosages, and squares and cross products of genotype dosages, for locus M and N, in the crossbred offspring from random mating between lines A and B

Genotype dosageFrequencyExpectations
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${M}_{g}$$\end{document}Mg\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${N}_{g}$$\end{document}Ng\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f$$\end{document}f\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E\left({M}_{g}^{2}\right)$$\end{document}EMg2a\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E\left({M}_{g}\right)$$\end{document}EMg\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E\left({N}_{g}^{2}\right)$$\end{document}ENg2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E\left({N}_{g}\right)$$\end{document}ENg\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E\left({{M}_{g}N}_{g}\right)$$\end{document}EMgNg
00\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$rr{^{\prime}}$$\end{document}rr00000
01\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r{^{\prime}}s+rs{^{\prime}}$$\end{document}rs+rs00\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${r}{^{\prime}}s+r{s}{^{\prime}}$$\end{document}rs+rs\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${r}{^{\prime}}s+r{s}{^{\prime}}$$\end{document}rs+rs0
02\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s{^{\prime}}s$$\end{document}ss00\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4s{^{\prime}}s$$\end{document}4ss\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2s{^{\prime}}s$$\end{document}2ss0
10\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r{^{\prime}}t+rt{^{\prime}}$$\end{document}rt+rt\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r{^{\prime}}t+r{t}{^{\prime}}$$\end{document}rt+rt\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${r}{^{\prime}}t+rt{^{\prime}}$$\end{document}rt+rt000
11\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${r}{^{\prime}}u+ru{^{\prime}}+{s}{^{\prime}}t+st{^{\prime}}$$\end{document}ru+ru+st+st\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r{^{\prime}}u+ru{^{\prime}}+s{^{\prime}}t+st{^{\prime}}$$\end{document}ru+ru+st+st\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r{^{\prime}}u+ru{^{\prime}}+s{^{\prime}}t+s{t}{^{\prime}}$$\end{document}ru+ru+st+st\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r{^{\prime}}u+ru{^{\prime}}+{s}{^{\prime}}t+st{^{\prime}}$$\end{document}ru+ru+st+st\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${r}{^{\prime}}u+r{u}{^{\prime}}+s{^{\prime}}t+s{t}{^{\prime}}$$\end{document}ru+ru+st+st\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r{^{\prime}}u+r{u}{^{\prime}}+s{^{\prime}}t+st{^{\prime}}$$\end{document}ru+ru+st+st
12\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s{^{\prime}}u+su{^{\prime}}$$\end{document}su+su\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s{^{\prime}}u+su{^{\prime}}$$\end{document}su+su\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s{^{\prime}}u+s{u}{^{\prime}}$$\end{document}su+su\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4s{^{\prime}}u+4su{^{\prime}}$$\end{document}4su+4su\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2{s}{^{\prime}}u+2s{u}{^{\prime}}$$\end{document}2su+2su\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2s{^{\prime}}u+2su{^{\prime}}$$\end{document}2su+2su
20\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t{^{\prime}}t$$\end{document}tt\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4t{^{\prime}}t$$\end{document}4tt\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2t{^{\prime}}t$$\end{document}2tt000
21\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t{^{\prime}}u+t{u}{^{\prime}}$$\end{document}tu+tu\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${4t}{^{\prime}}u+4t{u}{^{\prime}}$$\end{document}4tu+4tu\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2t{^{\prime}}u+2tu{^{\prime}}$$\end{document}2tu+2tu\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t{^{\prime}}u+tu{^{\prime}}$$\end{document}tu+tu\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${t}{^{\prime}}u+tu{^{\prime}}$$\end{document}tu+tu\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${2t}{^{\prime}}u+2t{u}{^{\prime}}$$\end{document}2tu+2tu
22\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u{^{\prime}}u$$\end{document}uu\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${4u}{^{\prime}}u$$\end{document}4uu\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${2u}{^{\prime}}u$$\end{document}2uu\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4u{^{\prime}}u$$\end{document}4uu\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2u{^{\prime}}u$$\end{document}2uu\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4{u}{^{\prime}}u$$\end{document}4uu
1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left(t+u\right)+\left({t}{^{\prime}}+{u}{^{\prime}}\right)+2\left({t}{^{\prime}}+{u}{^{\prime}}\right)\left(t+u\right)$$\end{document}t+u+t+u+2t+ut+u\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left(t+u\right)+\left({t}{^{\prime}}+{u}{^{\prime}}\right)$$\end{document}t+u+t+u\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left(s+u\right)+\left({s}{^{\prime}}+u{^{\prime}}\right)+2\left({s}{^{\prime}}+{u}{^{\prime}}\right)\left(s+u\right)$$\end{document}s+u+s+u+2s+us+u\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left(s+u\right)+\left({s}{^{\prime}}+{u}{^{\prime}}\right)$$\end{document}s+u+s+u\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u{^{\prime}}\left(1+t+u\right)+u\left(1+{t}{^{\prime}}+{u}{^{\prime}}\right)+s{^{\prime}}\left(t+u\right)+s\left({t}{^{\prime}}+u{^{\prime}}\right)$$\end{document}u1+t+u+u1+t+u+st+u+st+u w

a refers to the expectation of the squared genotype dosage for locus, and similar definitions apply for the,,. The derivation of the expectations of genotype dosages is given in Appendix 2

Expected genotype frequencies in the crossbred offspring when individuals from lines and are mated at random to each other aMarginal frequencies are haplotype frequencies bJoint frequencies are the genotype frequencies Unordered genotypes, their genotype dosages, frequencies, and expectations of genotype dosages, and squares and cross products of genotype dosages, for locus M and N, in the crossbred offspring from random mating between lines A and B a refers to the expectation of the squared genotype dosage for locus, and similar definitions apply for the,,. The derivation of the expectations of genotype dosages is given in Appendix 2 Using the values in Table 3, the covariance of genotype dosage at the two loci follows from , where and are the genotype dosages at loci and , and the variances of genotype dosage follow from and the corresponding expression for locus . Substituting the resulting expressions into the expression for the correlation coefficient yields the following expectation of the genotype-based : This expression is identical to the expression for the true haplotype-based (Eq. (1)). Thus, when two lines (the lines can be pure or crossbred) are crossed but individuals from the two lines are mated at random to each other, expectations of the genotype-based and the haplotype-based in the crossbreds (F1, F2, Fn) and in other cross types (BC1, BC2, BCn) are identical, irrespective of differences in the inbreeding coefficients at the two loci. Note that our derivation also applies to other measures of LD, i.e. and . For example, measures of based on genotypes and haplotypes are the numerators of Eqs. (1) and (2), which are identical. Furthermore, using Eqs. (1) and (2), the in the crossbred population can be predicted if the haplotype and genotype frequencies of the two parental lines are known. Note that Eq. (2) refers to the expected between the genotype dosage at the two loci, not to an estimate thereof. Hence, although the expected values of and are identical, their estimates for a given data set may differ depending on sampling bias and the sampling errors of the estimates. This will be investigated using a simulation study in the next section.

Simulation

The objective of the simulation was to investigate and compare the bias and precision of the genotype-based and haplotype-based estimates of for a crossbred population. We investigated the bias and the precision for different sets of allele frequencies, levels of LD as measured by , and sample sizes. To limit computation time, we directly sampled haplotypes according to their probability distribution, rather than simulating a population of individuals. The haplotype probability distribution follows from the allele frequencies at the two loci and the level of LD. Using the haplotype frequencies and sample size, haplotypes were sampled from a multinomial distribution for each of the two parental lines. The genotypes of the crossbred individuals were obtained by random sampling of one haplotype from each line. Next, the genotype-based and haplotype-based estimates of were computed from the genotypes and haplotypes, respectively, of the crossbred offspring. The parameter values (allele frequencies, for each line, and sample size) that were used for simulation were used to compute the true in the crossbreds, using Eq. (1), which was used as a benchmark to evaluate the precision and bias of the two estimates of . Thus, there were three measures of : the true calculated from the parameter values used for simulation, the haplotype-based estimate of , and the genotype-based estimate of . For each set of parameters, results were based on 1000 replicates. We used the R software [10] to simulate the data and analyse the results. The source code for the simulation is available at the following GitHub repository. https://github.com/setegnworku/Simulation-code-for_LD_crossbred_pop.

Scenarios investigated

We considered only biallelic loci at two loci in crossbreds resulting from the random mating of two outbred lines ( and ). We varied three parameters: (i) allele frequencies and (ii) in the parental lines, and (iii) the sample size. For the allele frequencies, we considered a range from 0.05 to 0.45, incremented by 0.10, for both lines. To limit the number of scenarios, we used equal allele frequencies at the two loci for most scenarios. Note that there is no true difference between the major and the minor allele, e.g., = 0.05 is equivalent to = 0.95, such that results for allele frequencies ranging from 0.55 to 0.95 are identical to those for 0.05 to 0.45. For in the parental lines, we considered values of 0.2, 0.4, 0.6, and 0.8. To reduce the number of scenarios, was the same in both lines. We considered sample sizes of 900, 1800, and 2700. This resulted in a total of 180 scenarios with equal allele frequencies at the two loci within each line, of which 120 had different allele frequencies between the two lines, and all had equal in the two lines (Table 4). In addition to those 180 scenarios, we investigated a few scenarios where allele frequencies differed between loci within the parental lines and for which differed between the parental lines.
Table 4

Combinations of minor allele frequencies for lines and investigated in the simulationa

Line \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${B}$$\end{document}BLine \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${A}$$\end{document}A
0.050.150.250.350.45
0.05XXXXX
0.15XXXX
0.25XXX
0.35XX
0.45X

aAllele frequencies were equal for the two loci ( and ) within a line. Apart from the diagonal elements, the allele frequencies differed between the two lines. Scenarios in this Table were replicated for sample sizes of 900, 1800, and 2700, and in the parental lines equal to 0.2, 0.4, 0.6, and 0.8 (equal for both lines), yielding a total of 3*4*15 = 180 scenarios

Combinations of minor allele frequencies for lines and investigated in the simulationa aAllele frequencies were equal for the two loci ( and ) within a line. Apart from the diagonal elements, the allele frequencies differed between the two lines. Scenarios in this Table were replicated for sample sizes of 900, 1800, and 2700, and in the parental lines equal to 0.2, 0.4, 0.6, and 0.8 (equal for both lines), yielding a total of 3*4*15 = 180 scenarios

Results and discussion

The full results for all 180 simulated scenarios, including bias, ratio of precision (ratio of standard deviation for the estimates using unphased genotype and haplotype data), correlation of the standard deviation, of the estimate using unphased genotype and haplotype data is given in the following R shiny App (https://setegnmaths.shinyapps.io/LD_App/). The source code for the Shiny App is available in the following github repository: https://github.com/setegnworku/Linkage_disequilibrium_crossbred_ShinyApp. Results showed that the estimates of for 180 scenarios were unbiased, both for the haplotype-based and the unphased genotype-based estimates of . Moreover, simulation results also confirmed our theoretical finding that unphased genotype-based and haplotype-based on average are the same for a given dataset, irrespective of differences in inbreeding coefficients between the two loci (Fig. 1).
Fig. 1

Comparison of estimates of linkage disequilibrium based on unphased genotype and haplotype data for scenarios where allele frequencies differed between loci and between lines, with for line and for line . Sample size was 900 (1000 replicates)

Comparison of estimates of linkage disequilibrium based on unphased genotype and haplotype data for scenarios where allele frequencies differed between loci and between lines, with for line and for line . Sample size was 900 (1000 replicates) As shown in Fig. 1, for a given dataset was unbiased for scenarios where allele frequencies differed between loci (i.e., inbreeding coefficients differed between the two loci) and between lines, and when the differed between the lines (0.2 and 0.4). We also tested the bias of LD estimates using unphased genotype and haplotype data for different sample sizes (Fig. 2). As shown in Fig. 2, for all scenarios, both estimators were unbiased for a sample size above 300. However, with sample size of 300 or less (100, 200, and 300), we found a small downward bias for both the unphased genotype- and haplotype-based estimates (the independent sample t-test showed the bias was significant for some of the scenarios for both the unphases genotype- and haplotype-based estimates). It is well known that the estimator of the correlation coefficient is known to be biased, and more so for smaller samples [11], which may explain the bias we found in small samples.
Fig. 2

Comparison of linkage disequilibrium estimated from unphased genotype and haplotype data, for different sample sizes

Comparison of linkage disequilibrium estimated from unphased genotype and haplotype data, for different sample sizes

Bias

For all scenarios (180), the estimates of the using unphased genotype and haplotype data were both unbiased. We ran an independent sample t-test to test the bias of the estimates of using unphased genotype and haplotype data from the true . For all 180 scenarios, the bias of the estimates was not significantly different from zero for both methods (p value > 0.05). The average absolute bias across 180 scenarios was 0.0004 when using unphased genotype data, and 0.0003 when using haplotype data (Table 5). The maximum absolute bias across the 180 scenarios was 0.003 when using unphased genotype data and 0.002 when using haplotype data. As expected, the bias decreased as sample size increased. For example, with unphased genotype data, the average absolute bias was 0.0005 for a sample size of 900 and 0.0001 for a sample size of 2700. Corresponding values for haplotype data were 0.0003 and 0.0001. These results show that the estimators of are consistent for both unphased genotype data and haplotype data, because the bias of the estimates decreased as sample size increased.
Table 5

Summary of estimates of bias and precision (standard deviation) of using unphased genotype and haplotype data

ParameterHaplotypeGenotype
Average absolute bias across 180 scenarios0.00030.0004
Maximum absolute bias0.0020.003
Average absolute bias sample size 9000.00030.0005
Average absolute bias sample size 27000.00010.0001
Standard deviation (SD) across 180 scenarios0.0210.023
Maximum SD0.0550.057
Average SD with sample size of 9000.0270.031
Average SD with sample size of 27000.0160.018
Summary of estimates of bias and precision (standard deviation) of using unphased genotype and haplotype data

Precision

For all scenarios, estimates of LD based on haplotype data were more precise than estimates based on unphased genotype data, although the differences were small. For example, the mean standard deviation of the estimates of across all scenarios was 0.023 when using unphased genotype data and 0.021 when using haplotype data. The maximum standard deviation for estimates of across all scenarios was 0.057 using unphased genotype data and 0.055 using haplotype data. The precision of the estimates of increased as sample size increased, both with unphased genotype and with haplotype data. For example, the average standard deviation across all scenarios with a sample size of 900 was 0.031 with unphased genotype data and 0.027 with haplotype data. The corresponding values for a sample size of 2700 were 0.018 and 0.016. This result was as expected because the standard error of the estimate of a correlation coefficient decreases as sample size increases [12]. Thus, with a sufficient sample size, in crossbreds can be estimated accurately based on unphased genotype data. We further investigated in which scenarios the difference in precision for the estimates of using unphased genotype versus haplotype data was the largest. We investigated this by computing the ratio of the standard deviations of the estimates of using haplotype data and unphased genotype data. Thus, smaller values of this ratio indicate a greater superiority of estimates based on haplotypes. As shown in Fig. 3, the ratio of precision was less than 1 for all scenarios, indicating that the estimate based on haplotype data was more precise than that based on unphased genotype data. The ratio of the precision increased as the level of LD increased. For example, for an of 0.2, the ratio of precision ranged from 0.75 to 0.9, while with an of 0.8, the ratio ranged from 0.92 to 0.98. The difference between the estimates of based on unphased genotype vs. haplotype data originates solely from the double heterozygotes (00/11 for coupling phase, or 01/10 for repulsion phase). As increases, the frequencies of the coupling phase haplotypes 00 and 11 or of the repulsion phase haplotypes 01 and 10, increase, which reduces the opportunity for the haplotype method to provide extra information by distinguishing between them. As a result, at larger , the precision of the estimates of using unphased genotype and haplotype data are expected to be closer to each other. On the other hand, at low , all haplotypes (00, 01, 10, 11) are possible and the haplotype-based method provides additional information. For this reason, the estimate of based on haplotype data is more precise than the estimate based on unphased genotype data, in particular when the true is small.
Fig. 3

Ratio of precision for all scenarios investigated for values of 0.2, 0.4, 0.6, and 0.8

Ratio of precision for all scenarios investigated for values of 0.2, 0.4, 0.6, and 0.8 The ratio of precision decreased when the minor allele frequencies for the two loci increased (Figs. 3 and 4). For example, for allele frequencies of 0.05 and 0.05 at the two loci, the ratio of precision ranged from 0.93 to 0.99, while it ranged from 0.73 to 0.94 for allele frequencies of 0.45 and 0.45. This is because the proportion of the double heterozygotes in the population decreases when the minor allele frequencies at the two loci decrease, which reduces the extra information provided by the haplotype-based method. This is in agreement with [13]. There was also an interaction between the level of LD and the minor allele frequency, with the ratio of precision increasing when the level of LD increased but this increase was larger for higher values of the minor allele frequency (Fig. 4). The ratio of precision at allele frequencies of 0.05 and 0.05 was 0.91 when was 0.2 and 0.99 for an of 0.9. However, the corresponding values for allele frequencies of 0.45 and 0.45 were 0.70 when was 0.2 and 0.94 for an of 0.9. When the minor allele frequencies at the two loci decrease, the proportion of double heterozygotes decreases, which reduces the extra information provided by the haplotype-based method. Thus, with extreme allele frequencies at the loci (e.g. 0.05 and 0.05), both methods yielded similar results, irrespective of the level of LD. On the other hand, at intermediate allele frequencies, such as 0.45 and 0.45, the proportion of double heterozygotes in the population increases, which increases the extra information provided by the haplotype-based method, particularly when LD is weak.
Fig. 4

Ratio of precision for sample size of 900 for selected scenarios

Ratio of precision for sample size of 900 for selected scenarios In real applications, the true is unknown and the computed using haplotype data would serve as the reference value. In that case, the comparison would be between the computed using unphased genotype data relative to the estimate based on haplotype data. In this case, the average absolute bias across the 180 scenarios using unphased genotype was very close to zero (0.00017) and the average standard deviation of estimates based on unphased genotype data across all scenarios relative to haplotype data was 0.0026. In addition, the haplotype-based method assumes that the haplotype can be determined without error for each individual, which means that in reality the absolute bias may be lower than the above value of 0.00017, depending on the error of haplotype estimation. Thus, estimates of computed using unphased genotype and haplotype data are indistinguishable in terms of both bias and precision in practice, particularly with sufficient sample size. This paper extends the work of Rogers and Huff [8] and Weir [14], who showed that LD can be estimated from unphased genotype data when the allele frequency in line and line is the same, and when the inbreeding coefficient is identical for the two loci. Here, we showed that LD can also be estimated using unphased genotype data when the allele frequencies differ between lines and and the inbreeding coefficients differ between the two loci. This is particularly relevant for hybrids in plant breeding [15] and for crossbreds in animal breeding [16, 17].

Conclusions

This work shows that the expectation of estimates of linkage disequilibrum (LD) between loci based on unphased genotypes and haplotypes in F1 crossbreds are identical. Estimates of LD, i.e. , are more precise and less biased when based on haplotype data compared to unphased genotype data. For both unphased genotype and haplotype data, the precision of increases and the bias of the estimates decreases as sample size increases. More importantly, the difference in precision and bias between estimates of using haplotype and unphased genotype data decreases as sample size increases. Thus, LD in a crossbred population can be estimated using unphased genotyped data with little bias and good precision, particularly with sufficient sample size.
  8 in total

1.  Prediction of total genetic value using genome-wide dense marker maps.

Authors:  T H Meuwissen; B J Hayes; M E Goddard
Journal:  Genetics       Date:  2001-04       Impact factor: 4.562

Review 2.  The use of molecular genetics in the improvement of agricultural populations.

Authors:  Jack C M Dekkers; Frédéric Hospital
Journal:  Nat Rev Genet       Date:  2002-01       Impact factor: 53.242

Review 3.  Mapping genes for complex traits in domestic animals and their use in breeding programmes.

Authors:  Michael E Goddard; Ben J Hayes
Journal:  Nat Rev Genet       Date:  2009-06       Impact factor: 53.242

4.  Linkage disequilibrium between loci with unknown phase.

Authors:  Alan R Rogers; Chad Huff
Journal:  Genetics       Date:  2009-05-11       Impact factor: 4.562

Review 5.  Linkage disequilibrium and association mapping.

Authors:  B S Weir
Journal:  Annu Rev Genomics Hum Genet       Date:  2008       Impact factor: 8.929

6.  Linkage disequilibrium in finite populations.

Authors:  W G Hill; A Robertson
Journal:  Theor Appl Genet       Date:  1968-06       Impact factor: 5.699

Review 7.  Linkage disequilibrium in humans: models and data.

Authors:  J K Pritchard; M Przeworski
Journal:  Am J Hum Genet       Date:  2001-06-14       Impact factor: 11.025

8.  A Scale-Corrected Comparison of Linkage Disequilibrium Levels between Genic and Non-Genic Regions.

Authors:  Swetlana Berger; Martin Schlather; Gustavo de los Campos; Steffen Weigend; Rudolf Preisinger; Malena Erbe; Henner Simianer
Journal:  PLoS One       Date:  2015-10-30       Impact factor: 3.240

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.