Literature DB >> 30068523

Recursive Algorithms for Modeling Genomic Ancestral Origins in a Fixed Pedigree.

Chaozhi Zheng¹, Martin P Boer², Fred A van Eeuwijk².

Abstract

The study of gene flow in pedigrees is of strong interest for the development of quantitative trait loci (QTL) mapping methods in multiparental populations. We developed a Markovian framework for modeling ancestral origins along two homologous chromosomes within individuals in fixed pedigrees. A highly beneficial property of our method is that the size of state space depends linearly or quadratically on the number of pedigree founders, whereas this increases exponentially with pedigree size in alternative methods. To calculate the parameter values of the Markov process, we describe two novel recursive algorithms that differ with respect to the pedigree founders being assumed to be exchangeable or not. Our algorithms apply equally to autosomes and sex chromosomes, another desirable feature of our approach. We tested the accuracy of the algorithms by a million simulations on a pedigree. We demonstrated two applications of the recursive algorithms in multiparental populations: design a breeding scheme for maximizing the overall density of recombination breakpoints and thus the QTL mapping resolution, and incorporate pedigree information into hidden Markov models in ancestral inference from genotypic data; the conditional probabilities and the recombination breakpoint data resulting from ancestral inference can facilitate follow-up QTL mapping. The results show that the generality of the recursive algorithms can greatly increase the application range of genetic analysis such as ancestral inference in multiparental populations.

Entities: Disease Species

Keywords: Ancestral inference; Collaborative Cross (CC); Identical by descent; Junction theory; MPP; Multiparent Advanced Generation InterCross (MAGIC); QTL mapping resolution; Recombinant Inbred Line (RIL); multiparental populations

Mesh：

Year: 2018 PMID： 30068523 PMCID： PMC6169389 DOI： 10.1534/g3.118.200340

Source DB: PubMed Journal: G3 (Bethesda) ISSN： 2160-1836 Impact factor: 3.154

Many complicated experimental crosses have recently been produced for mapping QTL, particularly in plants (e.g., Kover ; Sannemann ). In contrast to traditional biparental populations, multiple parents have been used to increase genetic diversity and thus QTL segregation probability, and many generations of intercross mating have been used to increase accumulated recombination breakpoints in sampled offspring in the final generation and thus QTL mapping resolution. The expected density of recombination points is one of key quantities for optimizing experimental designs prior to collecting genotypic and phenotypic data (Rockman and Kruglyak 2008). Furthermore, the identification of recombination breakpoints from genotypic data provides useful information for increasing detection power and mapping resolution (Xu 2013; Li ). The primary aims of this paper are to develop the theory of gene flow in a fixed pedigree with arbitrary structure to calculate the prior distribution of recombination points, and to apply the theory for ancestral inference and detection of recombination points from genotypic data in multiparental populations. The theory of one-locus gene flow in a pedigree has been well developed. The haploid genomes of pedigree founders are designated by distinct labels, called founder genome labels (FGLs). A set of genes are identical by descent (IBD) if they carry the same FGLs. The identity state for two genes is either IBD or non-IBD. The IBD probability for two distinct genes within an individual is the inbreeding coefficient, and it is the kinship coefficient for the two genes randomly sampled from two distinct individuals. Rostron (1978) developed a recursive algorithm for calculating the two-gene coefficients of inbreeding and kinship. Karigl (1981) developed a recursive algorithm for calculating the coefficients of the fifteen gene identity states for four genes between two individuals. Thompson (1983) showed that a similar algorithm applies to the probabilities of joint descent of multiple genes from a specific FGL. Bauman and Zhou developed a recursive algorithm for calculating the ancestral coefficients such as the probability of one gene descending from a given FGL and the probability of two genes descending from two given FGLs (distinct or not). The theory of gene flow at two linked loci in a pedigree has also been developed. Weir and Cockerham (1969) and Cockerham and Weir (1973) developed a recursive algorithm for calculating the two-locus inbreeding coefficient of an individual that is defined as a linear function of the probabilities of the fifteen identity states for four genes, two at each of two linked loci. Thompson (1988) developed a recursive algorithm for calculating the two-locus kinship between two individuals defined as the probability of IBD at both loci, see Hill and Weir (2007) for calculating multi-locus IBD probabilities in random mating populations. In contrast to the identity coefficients, the ancestral coefficient at two loci is defined as the two-locus diplotype probabilities, and there are possible diplotypes for four genes in an individual with respect to L distinct FGLs. Thus, the calculation of the two-locus ancestral inferences has been developed only for simple breeding systems including selfing, brother-sister (or sibling) mating, and parent-offspring mating (Haldane and Waddington 1931; Johannes and Colome-Tatche 2011; Broman 2012). There has been much interest in developing the theory of gene flow in a pedigree with chromosomes assumed to be a genomic continuum. Donnelly (1983) developed a formal mathematical framework, where the crossover processes in a chromosome pedigree follow jointly a continuous time Markov random walk on the vertices of a hypercube, the time parameter being position along homologous chromosomes. Here a vertex represents one of possible gene transmissions from founders to all non-founders, and the number of vertices in a d-dimensional hypercube is , d being the number of non-founders. This framework has been used in many kinds of small pedigree systems (e.g., Bickeböller and Thompson 1996; Stefanov 2000; Martin and Hospital 2011). However, it is impractical to apply Donnelly’s Markovian framework to a large complex pedigree since the number of states (hypercube vertices) increases exponentially with the pedigree size. Alternatively, the theory of junctions has been developed to study the gene flow for a genomic continuum (Fisher 1949, 1954). A junction is defined as a boundary point (recombination breakpoint) on a chromosome where two distinct FGLs meet. Fisher (1949, 1954) and Bennett (1953, 1954) developed the theory of junctions for simple breeding systems including selfing, brother-sister mating, and parent-offspring mating. It has been extended to populations (Stam 1980; Baird ; Chapman and Thompson 2003; MacLeod ; Zheng ; Zheng 2015). In particular, Zheng and Zheng (2015) developed a Markovian framework for modeling ancestral origins within an individual with the state space being the possible pairs of FGLs, and thus the number of states is much smaller than that of Donnelly’s Markovian framework (Donnelly 1983). On the other hand, our Markovian framework (Zheng ; Zheng 2015) is restricted by two assumptions: the mating scheme from one generation to the next is random mating, and the FGLs are assumed to be exchangeable so that the probability distribution of ancestral origins in offspring is invariant to all possible permutations of the FGLs. However, the exchangeability generally does not hold for a mapping population produced via an arbitrary breeding pedigree. In this paper, we relax the restriction of our previous Markovian framework by extending it to a fixed pedigree. We first describe a recursive algorithm (denoted by EXCH) under the assumption of FGLs being exchangeable, which is applicable for the construction of Markovian framework in simple breeding schemes. Then we develop a recursive algorithm (denoted by NON-EXCH) for modeling ancestral origins to relax the exchangeability assumption. The two recursive algorithms apply to both autosomes and sex chromosomes if they exist. The results of both recursive algorithms are compared with those from extensive simulations on a classical pedigree. We first apply the two algorithms to compare different population designs (or breeding pedigrees) prior to experiments, for example, in terms of the overall density of junctions (recombination breakpoints), an important factor of QTL mapping resolution. In addition, the non-exchangeability can be illustrated under various breeding designs. Then we apply the two algorithms to incorporate pedigree information for ancestral inference from genotypic data in simulated and real collaborative cross (CC) populations, resulting in conditional probabilities that are necessary for downstream QTL mapping.

Recursive Algorithm Exch

Notations and overview

The symbols are briefly explained in Table 1, and some of them are illustrated in Figure 1. Pedigree members with unspecified mother and father are called the founders of the pedigree, and the other members are non-founders. The recursive algorithm presupposes that pedigree members are ordered under the constraint that parents always precede children. We denote by subscripts a, b, c the members of a pedigree, and denote by individual a comes after individual b We denote by superscripts m the maternally derived genes or chromosomes, and p for the paternally derived. We denote by superscripts o, , , and the unspecified parental origins (m or p) of genes or chromosomes. For the sake of brevity, denotes a horizontal piece-wise equation, so that and for the autosomes of individual a, XX chromosomes of female a, and XY chromosomes of male a, respectively.

Table 1

List of symbols and their brief descriptions

Symbol	Description
1S	An indicator function that equals 1 if statement S is true and 0 otherwise
0S1S2	An indicator function that equals 0 if S1 and S2 are true and 1 otherwise.
o,o1,o2,o3	A gene or haplotype is maternally (=m) or paternally (=p) derived
a,b,c	Pedigree members
♂, ♀	Father of a, mother of a
φabmp(11)	Probability of IBD between maternal gene of a and paternal gene of b
φabo1o2(11)	Probability of IBD between two genes of ab with parental origins o1o2
φabo1o2(12)	Probability of non-IBD between two genes of ab with parental origins o1o2
φabco1o2o3(123)	Probability of non-IBD among three genes of abc with parental origins o1o2o3
Ram	Expected junction density (per Morgan) on the maternal chromosome of a
Rao	On the chromosome of a with parental origin o
(g1g2g3g4)	A junction type denoted by two-locus four-gene identity state: g1g2 (g3g4) are
	on the left-hand (right-hand) side of junction, and g1g3 and g2g4 are two haplotypes.
	Seven types: 1112, 1121, 1122, 1211, 1213, 1222, and 1232
Jabo1o2(g1g2g3g4)	Expected density of junction type (g1g2g3g4), where haplotype g1g3 of a has
	parental origin o1 and haplotype g2g4 of b has parental origin o2
ρaamp	Expected overall junction density for a. ρaamp=Ram+Rap−Jaamp(1122)
Ω	A set of distinct FGLs assigned to the founders of a pedigree, i,j,k∈Ω
i≠j≠k	FGLs i,j,k differ from each other so that i≠j, j≠k, and i≠k
φam(i)	Probability that the maternal gene of a has FGL i
φao(i)	Probability that the gene of a with parental origin o has FGL i. ∑i∈Ωφao(i)=1
φabmp(ij)	Probability that the maternal gene of a has FGL i and the paternal gene of b has FGL j.
φabo1o2(ij)	Probability that two genes of ab with parental origins o1o2 have FGLs ij
	φabo1o2(12)=∑i,j∈Ω,i≠jφabo1o2(ij)
φabco1o2o3(ijk)	Probability that three genes of abc with parental origins o1o2o3 have FGLs ijk.
	φabco1o2o3(123)=∑i,j,k∈Ω,i≠j≠kφabco1o2o3(ijk)
Rao(ij)	Expected junction density (per Morgan) on chromosome of a with parental origin o,
	where the left-hand (right-hand) side of junction has FGL i(j≠i). Rao=∑i,j∈Ω,i≠jRao(ij)
Jabo1o2(ijkj)	Expected junction density: the four genes have FGLs ijkj, haplotype ik of a has parental origin o1,
	and haplotype jj of b has parental origin o2. Jabo1o2(1232)=∑i,j,k∈Ω,i≠j≠kJabo1o2(ijkj).
	Similarly for Jabo1o2(iiij), Jabo1o2(iiji), Jabo1o2(iijj), Jabo1o2(ijii), Jabo1o2(ijik), Jabo1o2(ijjj)

Figure 1

Illustration of some quantities. An identity state may correspond to many ancestral states. Different FGLs are shown by different colors, and the irrelevant chromosomes are shown as gray. (A) Some two- or three-gene ancestral coefficients and their corresponding identity coefficients. (B) Some expected ancestral junction densities and their corresponding expected identity junction densities. See Table 1 for brief explanations of these quantities. Pictorial representation of recurrence equations. (A) The equation for ancestral coefficient . (B) The equation for ancestral junction density The FGLs i, j, and k are shown by different colors, and the irrelevant chromosomes are shown as gray. The symbol denotes the mother of individual a, and see Table 1 for other notations. In the derivations of recurrence relations of a quantity, we will focus on a particular ordering if the quantity concerns genes in two individuals; we will always trace the maternally derived gene or chromosome within non-founder a back to its two parental genes or chromosomes if the maternally derived gene or chromosome concerns the quantity, and otherwise trace the paternally derived gene or chromosome. Throughout this paper, and always denote the father and mother of individual a, rather than any other, respectively. The identity coefficient denotes that two genes at a single locus have identity state where the first gene is in individual a and has parental origin and the second gene is in individual b and has parental origin (Figure 1A). There are only two two-gene identity states: IBD (11) or non-IBD (12), and Similarly, we denote by the three-gene identity coefficient with the identity state being the non-IBD state which is required together with two-gene identity coefficients for deriving the recursive relations for junction densities. An expected identity junction density is defined as the expected number of the specified type of recombination breakpoints per Morgan along two homologous chromosomes. Here expectation concerns the stochasticity of gene flow from founders to descendants on a fixed pedigree. And identity indicates that only the identity patterns (not specific FGLs) on the two sides of junctions matter. In the recursive algorithm NON-EXCH, we will introduce ancestral junction densities where the specific FGLs do matter. For example, denotes that the four genes, two at each side of a breakpoint along two chromosomes, have genetic identity state (Figure 1B). Here the first and third genes are on the left and right sides of a breakpoint, respectively, and they are in individual a and have parental origin And the second and fourth genes are on the left and right sides, respectively, and they are in individual b and have parental origin We adopt the notation of Nadot and Vayssiex (1973) for an identity state: the genes are labeled by natural integers starting 1, and the same integer is assigned to the gene that is IBD with a previous gene. Following the previous framework (Zheng ; Zheng 2015), we model ancestral origins along two homologous chromosomes within an individual by a continuous time Markov process, which can be described by an initial distribution π of ancestral origins and a rate matrix Q. Under the assumption of exchangeability among FGLs, the initial distribution π of individual a is determined by the identity coefficient and the rate matrix Q of individual a is determined by the five expected identity junction densities and see Zheng (2015) for an illustrative construction of rate matrix Q from the five junction densities. In the following, we describe a recursive algorithm for calculating the identity coefficients and the expected junction densities for an individual in a given pedigree. In short, let be an indicator function that equals 1 if statement S is true and 0 otherwise, and let be an indicator function that equals 0 if statements and are true and 1 otherwise.

Identity coefficients

The recurrence relations for the two- and three-gene identity coefficients are similar to the recursive algorithm of Karigl (1981). However, the calculation proceeds by tracing back genes instead of individuals, see also e.g., Jacquard (1974), Nadot and Vayssiex (1973), Denniston (1974), and Garcia-Cortes (2015). Thus, we can account for the asymmetry between maternally and paternally derived chromosomes, and account for autosomes and sex chromosomes simultaneously. Throughout this paper, we assume that there are no crossovers between X and Y sex chromosomes within a male. In addition, we focus on non-IBD state instead of IBD state to simplify the recurrence relations. We derive the recurrence relations according to the Mendelian inheritance rules: (1) a maternally (or paternally) derived autosomal gene descends from the two genes within the mother (or father, respectively) with equal probability (2) similar to (1) for a maternally derived X-linked gene, and (3) a paternally derived sex-linked gene within a female (or male) must descend from the X-linked (or Y-linked, respectively) gene of the father. The recurrence relations of the two-gene identity coefficient for non-founder a are given byfor Here the first equation follows from the rules 1 and 2 for the maternally derived gene within individual a, the second equation follows from the rules 1 and 3 for the paternally derived gene within individual a, and the last equation is obtained by reversing the ordering of two genes. The symmetries such as apply to three-gene identity coefficients by permuting the three genes, and thus we consider only a particular ordering and we havefor where the first (second) equation traces the maternally (or paternally, respectively) derived gene within individual a. The boundary conditions of these recurrence relations are given according to the assignment of FGLs to the founders in a pedigree. If we assign distinct FGLs to each haploid genome of an outbred founder, all multi-gene non-IBD probabilities are 1 if any pair of genes is not from the same haploid genome, and 0 otherwise. Four-gene or higher order identity coefficients may be derived similarly, but they are not required for the derivations of junction densities.

Expected identity junction densities

Let be the maternal (paternal) map expansion, the expected density of recombination breakpoints on the maternally (paternally) derived chromosome of individual a. The implicit two-gene identity state for the map expansions is It holds thatwhere and according to the reversibility of chromosome direction (Zheng 2015). The expected overall junction density for individual a is given by In the following, we derive the recurrence relations for the five expected junction densities and from which and can be derived according to the above equations of map expansions. The recurrence relations for the two map expansions of non-founder a are relatively simple, and they are given by (Zheng 2015)according to the theory of junctions that a new identity junction is formed whenever a recombination event occurs between two homologous chromosomes that are non-IBD at the location of a crossover (Fisher 1954). Since chromosomes are modeled as a genomic continuum, the shared junction type (1122) between two chromosomes can be formed only by duplicating the chromosome segment harboring the recombination breakpoint. For the expected junction density of non-founder a, we havefor where and the last equation is obtained by reversing the ordering of the two haplotypes. In the first equation, refers to the same chromosome in individual a, and thus equals by definition. We trace the maternally derived haplotype within non-founder a back to its two parental haplotypes if the maternally derived haplotype concerns the junction density, and otherwise track the paternally derived haplotype. We derive the recurrence relations for and jointly. The recombination breakpoint for occurs on the first chromosome of a homologous pair, and it is on the second chromosome for The junction type becomes when reversing the ordering of two haplotype of junction type We havefor There are no contributions from the three-gene non-IBD probabilities in the equations for and because crossovers between the two parental haplotypes of the tracing haplotype within individual a are invisible. The boundary conditions are given by the assignment of FGLs to the founders. The within- and between-founder identity junction densities are 0 if a single FGL is assigned to the whole haploid genome of each founder.

Recursive Algorithm Non-Exch

Let denote the set of distinct FGLs assigned to the founders of a pedigree. We adopt the same notations in the recursive algorithm with FGL exchangeability except for the following changes. As shown in Figure 1, we replace identity states by ancestral states: and for the two-chromosome expected junction densities, and and explicitly for the map expansions, where are different from each other (denoted by ). We represent a two-gene ancestral state by and a three-gene ancestral state by where are not necessary different from each other. In addition we introduce one-gene ancestral coefficient the probability that the maternally or paternally derived gene in individual a carries FGL The identity coefficients can be obtained by summing the corresponding ancestral coefficients. For example, Similar relationships hold between the expected identity junction densities and the expected ancestral junction densities, for example, and Similarly, the initial distribution π of the Markov chain can be constructed from the two-gene ancestral coefficients, and the rate matrix Q can be constructed from the expected ancestral junction densities, without assuming the exchangeability of FGLs.

Ancestral coefficients

Bauman and Zhou derived the recurrence relations of the one- and two-gene ancestral coefficients for autosomes by tracing two distinct genes simultaneously in their parents. We derive equivalent recurrence relations for both autosomes and sex chromosomes by tracing one gene once in its parent, instead of simultaneously tracing two genes within an individual. In addition, we derive the recurrent relations of the three-gene ancestral coefficients that are required in the recurrence relations of the expected ancestral junction densities. The relations of the ancestral coefficients are very similar to those of the identity coefficients, and they are based on the same rules of Mendelian inheritance. For example for the one-gene ancestral coefficient of non-founder a, we haveSee Appendix A for the recurrence relations of the two-gene ancestral coefficient and three-gene ancestral coefficient The boundary conditions of the recurrence relations of the ancestral coefficients are given by the FGLs assigned to the founders in a pedigree.

Expected ancestral junction densities

The ancestral map expansions can be expressed in terms of the expected ancestral junction densities as followsHere refers to the junctions on the single chromosome of a with parental origin o, while the terms on the right-hand side refer to the junctions on the two homologous chromosomes of a. We have and where the equal signs are based on the reversibility of chromosome direction, and the unequal signs refer to the non-exchangeability of FGLs i and j. The recurrence relations for the ancestral map expansions are given bywhere the contributions of the two-gene ancestral coefficients account for the asymmetry of FGLs i and j, for example, A new ancestral junction is formed whenever a recombination event occurs between two homologous chromosomes that have the unordered FGLs i and j at the location of a crossover. The recurrence relations for are the same as those for except that identity states are replaced by ancestral states. We havefor where the last equation is obtained by reversing the ordering of the two haplotypes. See Appendix B for the recurrence relations of , and As before, the within- and between-founder ancestral junction densities are 0 if a single FGL is assigned to the whole haploid genome of each founder.

Data availability

Ancestral inference using the two recursive algorithms is implemented in the previous developed RABBIT software, where the function magicReconstruct is extended to incorporate pedigree information. RABBIT is available at https://github.com/chaozhi/RABBIT.git, and it is offered under the GNU Affero general public license, version 3 (AGPL-3.0). Example scripts for using the recursive algorithms, simulating genotypic data, and ancestral inference are included. The real CC data have been described by Durrant .

Application to Multiparental Populations

Simulation evaluation

Before applying the two recursive algorithms EXCH and NON-EXCH to multiparental populations, we evaluate their accuracy by forward simulations on the classical pedigree of Native Americans (Figure 3) (Chapman and Jacquard 1971). The pedigree has been previously used for evaluating recursive algorithms (Jacquard 1974; Karigl 1981; Garcia-Cortes 2015). We simulate two linkage groups: one pair of homologous autosomes and one pair of sex chromosomes. We assign FGLs to the haploid or diploid genomes of the founders. Each descendant gamete is specified as a list of FGL segments determined by chromosomal crossovers. The number of crossovers in a linkage group of a gamete follows a Poisson distribution with mean 1. We assume no genetic interference so that the positions of crossovers are independently and randomly distributed across the chromosomes. We obtain simulated results for the pedigree member “M22” (Figure 3) by averaging over simulations.

Figure 3

The pedigree of Native Americans. It consists of 6 founders and 16 non-founders. Circles denote females, and rectangles for males.

Table 2 shows the comparisons between the numerical results from the recursive algorithm EXCH and the simulated results. A unique FGL is assigned to each haploid genome of each founder, so that in total twelve distinct FGLs are assigned to the fixed founders. The differences between the numerical and simulated results are less than 0.002, which is very likely due to the stochasticity of gene flow from founders to descendants. The identity coefficient for autosomes is in agreement with the previous result (Karigl 1981).

Table 2

The identity coefficients and the expected identity junction densities for pedigree member ”M22”

Quantity ^a	AA autosomes		XX autosomes
	Numerical	Simulated	Numerical	Simulated
φaamp(11)	0.18359	0.18324	0.23437	0.23459
Jaamp(1122)	0.18359	0.18371	0.17969	0.17969
Jaamp(1211)	0.61572	0.61516	0.35938	0.35978
Jaamp(1213)	2.93652	2.93462	1.47656	1.47587
Jaamp(1222)	0.61572	0.61605	0.57031	0.57047
Jaamp(1232)	2.93652	2.93824	1.71094	1.71227

A unique FGL is assigned to each haploid genome of each founder in Figure 3.

A unique FGL is assigned to each haploid genome of each founder in Figure 3. The pedigree of Native Americans. It consists of 6 founders and 16 non-founders. Circles denote females, and rectangles for males. Table 3 evaluates the recursive algorithm NON-EXCH by the forward simulations. A unique FGL is assigned to the whole diploid genome of each founder, so that in total six distinct FGLs are assigned to the fixed founders. Here we use a different assignment of FGLs because Table 3 was otherwise too large. The results show that founder “J” does not contribute to the maternally derived autosome and X chromosome in offspring “M22”, which is relatively straightforward from the pedigree structure in Figure 3. The differences between the numerical and simulated results, including those for and are less than 0.001.

Table 3

The ancestral coefficients and the expected ancestral junction densities for pedigree member ”M22”

Quantity ^a	AA autosomes		XX autosomes
	Numerical	Simulated	Numerical	Simulated
φam(i)
A	0.21875	0.21885	0.125	0.12507
B	0.21875	0.21874	0.125	0.12467
J	0	0	0	0
L	0.125	0.12487	0.25	0.25078
P	0.3125	0.31241	0.25	0.24939
C	0.125	0.12513	0.25	0.25009
Ram(ij)
AB	0.31250	0.31272	0.14062	0.14098
AJ	0	0	0	0
AL	0.09375	0.09348	0.09375	0.09364
AP	0.28906	0.28880	0.15625	0.15602
AC	0.11719	0.11768	0.09375	0.09330
BJ	0	0	0	0
BL	0.09375	0.09405	0.09375	0.09384
BP	0.28906	0.28890	0.15625	0.15626
BC	0.11719	0.11742	0.09375	0.09331
JL	0	0	0	0
JP	0	0	0	0
JC	0	0	0	0
LP	0.15625	0.15644	0.18750	0.18748
LC	0.03125	0.03124	0.12500	0.12422
PC	0.10938	0.10922	0.18750	0.18735

The label of each founder is used as FGL that is assigned to its diploid genomes.

The label of each founder is used as FGL that is assigned to its diploid genomes. We confirmed that the numerical results from the algorithm NON-EXCH are reduced to those from the algorithm EXCH, by summing over possible FGLs under the given identity pattern. In addition, we confirmed the consistency between the two algorithms in the multiparental populations shown in Figure 4.

Breeding design

We apply the recursive algorithm EXCH to multiparental populations, which have become attractive for QTL mapping in many animal and plant species. Specifically, we study the inbreeding process by calculating the IBD probability and measure the QTL mapping resolution by calculating the expected overall junction density We focus on multi-way funnel breeding schemes (Figure 4). In the funnel scheme, the founders of each line are randomly permuted, and each line is produced by a intercross scheme that combines all founder genomes through several generations of pairwise crosses prior to repeated generations of inbreeding by e.g., sibling mating. The alternating backcross and the father-daughter backcross have been studied by Welsh and McMillan (2012) for accelerating inbreeding by a simulation approach.

Figure 4

Illustration of different 8-way funnel breeding schemes. The alternating backcross scheme alternates between mother-son and father-daughter matings in generation and the father-daughter backcross alternates between father-daughter and random sibling matings in generation Circles denote females, and rectangles for males. Figure 5A and B show the IBD probability and the expected overall junction density as a function of generation for the recombinant inbred lines (RIL) by selfing and the RIL+ by selfing. These plant breeding schemes have been adopted in many recently produced crop multiparent advanced generation intercross (MAGIC) populations (e.g., Huang ; Mackay ; Pascual ). The RIL+ by selfing has one additional generation of intercrossing instead of selfing in the RIL scheme (Teuscher and Broman 2007). Thus, the RIL+ scheme has smaller IBD probability (Figure 5A), and has about 1 per M larger overall junction density in a given later generation ()(Figure 5B).

Figure 5

The identity coefficient and the expected overall junction density for the breeding schemes in Figure 4. (A-B) The results for selfing mating schemes 1-2 in Figure 4. (C-D) The results for the autosomes of mating schemes 3-5 with backcross starting from in Figure 4. (E-F) The results for the sex chromosomes. Figure 5C and E show that the IBD probabilities for the alternating backcross and the father-daughter backcross are the same as those of the RIL by sibling for autosomes and they are smaller than those of the RIL by sibling for sex chromosomes. Welsh and McMillan (2012) measured the inbreeding process by the number of generations to reach complete fixation of genome as homozygous, and showed that the number of generation increases from the alternating backcross to the father-daughter backcross and to the RIL by sibling. The differences in the number of generation can be partially because the differences of the IBD probabilities for sex chromosomes, and partially because the complete fixation refers to one individual for the alternating backcross and two individuals for the father-daughter backcross. Figure 5D and F show that the expected overall junction densities for the alternating backcross and the father-daughter backcross are lower than those of the RIL by sibling for autosomes and sex chromosomes, and that the junction densities for the alternating backcross are slightly smaller than those of the father-daughter backcross for autosomes. These results are consistent with those of Welsh and McMillan (2012): the number of chromosome segments in the final inbred lines increases from the alternating backcross to the father-daughter backcross and to the RIL by sibling.

FGL exchangeability

We apply the recursive algorithm NON-EXCH to study the prior FGL exchangeability for the multi-way funnel breeding schemes (Figure 4). We assume that all founder parents are fully inbred, and assign FGLs from A to H to the eight inbred parents in order from left to right. To examine the FGL exchangeability, we calculate the ancestral coefficient and the expected ancestral junction density the latter being the only ancestral junction type for complete inbred individuals. Figure 6 shows the exchangeability in terms of and for the females in generation 11, and 22 for the RIL by sibling. The left panels show the results for autosomes. The FGL non-exchangeability confirms the pedigree inconsistency introduced by Liu : each of the four mating pairs of founder parents is impossible at a single locus in an individual, that is, for or and The FGL non-exchangeability shows a different pattern for there are three levels of expected junction densities and the four mating pairs of founder parents have the highest values. For both and the non-exchangeability diminishes in generation 22 with almost complete inbreeding.

Figure 6

The FGL non-exchangeability patterns for the ancestral coefficient and the expected ancestral junction density in the eight-way RIL by sibling. The FGLs of eight founders are A-H from left to right.

The FGL non-exchangeability patterns for the ancestral coefficient and the expected ancestral junction density in the eight-way RIL by sibling. The FGLs of eight founders are A-H from left to right. The right panels of Figure 6 show the exchangeability patterns for sex chromosomes of the RIL by sibling. The founder parents D, G, and H cannot pass their X chromosomes beyond F1, and thus their FGLs are impossible in generation Similar to autosomes, for or and . For or the probability of FGL C is around twice as large as that of A, B, E or F in generation because the X chromosome carrying FGL C in generation 1 is inherited with probability 1, whereas each of the X chromosomes carrying FGLs A, B, E, and F in generation 1 is inherited with probability 1/2. Similarly, the values of involving FGL C increase with inbreeding generation, they are about twice as large as others in generation 22. We examine the exchangeability patterns for the other multi-way funnel breeding schemes, and focus on autosomes since the FGL exchangeability does not hold in general for sex chromosomes because of the asymmetry between X and Y chromosomes. The non-exchangeability of disappears in a completely inbred individual for all 4- 8- and 16-way breeding schemes, because of random chromosomal segregations over many inbreeding generations. However, the FGL exchangeability of often does not hold even after for a completely inbred individual, for example, see Figure 7 for the breeding schemes after 100 generations of selfing inbreeding.

Figure 7

The FGL non-exchangeability patterns of the expected ancestral junction density The results are for autosomes of the multi-way RIL by 100 generations of selfing or sibling. The FGLs for founder parents are letters starting from A up to P from left to right.

Ancestral inference

Either one of the two recursive algorithms can be used as a prior to incorporate pedigree information for ancestral inference in multiparental populations from genotypic data, which will be illustrated as follows by simulated and real CC populations.

Simulated CC:

We simulate a CC population (Churchill ) with 100 independent funnels, and take the female at the last generation. The SNP data for the eight founder mouse strains are from Collaborative Cross Consortium (2012), with marker density SNPs per cM. The genotypic data of a sample female are obtained by simulating the FGLs forwardly and then combining them with the founder SNP data. Allelic errors are assumed to occur independently, and we simulate genotypic data of founders and sampled individuals with the same allelic error probability 0.005. We analyze four simulated data sets: F11-AA, F11-XX, F22-AA, and F22-XX, where the first part denotes the generation, and the second part denotes the 19 pairs of autosomes (AA) or the sex chromosomes (XX). We perform two analyses for each data set by applying each of the two recursive algorithms as the prior for modeling ancestral origins along two homologous chromosomes within an individual. See Figure 6 for the prior distributions of and as a part of results obtained from the algorithm NON-EXCH. The true allelic error probability is used for the genotypic data of founders and sampled individuals. Table 4 shows the evaluation of the prior FGL exchangeability on ancestral inference in the simulated CC population. The two analyses of the data set F22-AA are almost the same because there is approximately no prior FGL exchangeability for the autosomes in generation and the improvement of NON-EXCH over EXCH for F11-AA is because of the FGL non-exchangeability such as that of in Figure 6. The improvements for sex chromosomes are larger than those for autosomes because of the more pronounced FGL non-exchangeability (Figure 6). The improvement for the data set F22-XX is , and it increases to for F11-XX.

Table 4

Evaluation of the prior FGL exchangeability on ancestral inference in the simulated CC population consisting of 100 funnels

Probability	Simulated data set	EXCH	NON-EXCH	Improvement (%)^a
Wrongly assigned^b	F11-AA	0.02779	0.02715	2.3
	F11-XX	0.01821	0.01650	9.4
	F22-AA	0.02484	0.02476	0.3
	F22-XX	0.01540	0.01452	5.7
Wrongly called^c	F11-AA	0.01962	0.01888	3.8
	F11-XX	0.01284	0.01175	8.5
	F22-AA	0.01766	0.01771	−0.3
	F22-XX	0.01170	0.01084	7.4

The percentage decrease of the wrongly assigned (or called) probability for the analysis using the algorithm NON-EXCH, relative to the algorithm EXCH.

One minus the posterior probability of being true ancestral states.

One minus the fraction of called ancestral states being true ancestral states. At each SNP location within an individual, the ancestral state is called by its maximum posterior probability.

The percentage decrease of the wrongly assigned (or called) probability for the analysis using the algorithm NON-EXCH, relative to the algorithm EXCH. One minus the posterior probability of being true ancestral states. One minus the fraction of called ancestral states being true ancestral states. At each SNP location within an individual, the ancestral state is called by its maximum posterior probability.

Real CC:

The real CC population consists of 120 lines that are sampled at generation t in the range of (Durrant ). For each line, we estimate the sampling generation by using the algorithm EXCH and taking the generation with the maximum likelihood, where the funnel code is not required. Then we estimate the funnel code by using the algorithm NON-EXCH and an iterative maximum likelihood, where a funnel code is proposed by slightly disturbing the current funnel code and it is accepted if the likelihood is increased. We compare the algorithms EXCH and NON-EXCH with GAIN (Liu ), the latter being specifically developed for the CC. To study the effect of marker densities on ancestral inference, we analyze only the first pair of homologous autosomes and thin the full dataset by taking every second SNP markers, and repeat to obtain nested sub datasets. The data fractions or the relative marker densities are given by The absolute maximum marker density is 145 SNPs per cM, times higher than that of simulated CC. From the full dataset we calculate the marginal posterior probabilities by NON-EXCH, EXCH, and GAIN, and set the pseudo-true values to the most probable ancestral origins if they are the same among the three methods. Overall the pseudo-true ancestral origins for observed genotypes are obtained. Figure 8 shows that the results on ancestral inferences are only slightly different among the three methods. The results of GAIN and NON-EXCH contain no pedigree inconsistencies, whereas NON-EXCH assigns ancestral origins to the impossible mating pairs of founder parents with probability around 0.005. The wrongly assigned probability for GAIN is larger than those of NON-EXCH and EXCH, and the difference increases with the decreasing marker density. The ranking of the wrong called probability from lowest to highest is NON-EXCH, EXCH, and GAIN, and the improvement of NON-EXCH over EXCH is around and the improvement of NON-EXCH over GAIN is around when relative density The similar performances between algorithms NON-EXCH and EXCH indicate that the prior FGL exchangeability is a reasonable approximation for autosomes in the CC.

Figure 8

Evaluation of the recursive algorithms that are applied in the ancestral inference for the first pair of autosomes of the 120 real CC lines. Panels (A-C) show the results for the wrongly assigned probability, the wrongly called probability, and the pedigree inconsistency, respectively. The pedigree inconsistency is measured by the sum of the posterior probabilities over the four mating pairs of founder parents.

Discussion

We have developed two novel recursive algorithms for modeling genomic ancestral origins along two homologous chromosomes in a pedigree with arbitrary but known structure. The algorithms apply to both autosomes and sex chromosomes when they exist, and allow selfing in the pedigree. The extensive simulations on a real example pedigree show that the numerical results from the two recursive algorithms are consistent with the simulated results, apart from the stochasticity of gene flow. The Markov property is assumed for the ancestral origin process along two chromosomes, and thus genetic interference is assumed to be absent. The assumption does not affect the expected junction densities calculated from the two recursive algorithms, although it does affect the distribution of inter-junction distances including the variance of junction densities. One important application of the recursive algorithms is to design a breeding scheme for accelerating the inbreeding process to obtain immortal lines and for maximizing the resolution of mapping QTL. Welsh and McMillan (2012) studied inbreeding processes among different types of multiparental crosses by simulations, and it takes approximately 5.5 hr to complete 100,000 simulations of eight-way RILs. In contrast, it takes less than one second for our recursive algorithms. Rockman and Kruglyak (2008) compared different intercross breeding designs in multiparental RILs by simulations, aiming at increasing the density of junctions (recombination breakpoints) and thus fine-mapping resolution. Our previous recursive algorithms (Zheng ; Zheng 2015) can be used for calculating the junction density in random mating schemes, and the two new algorithms extend the calculation for any breeding schemes with fixed pedigrees. The two recursive algorithms can also be used in random mating schemes by applying them to many pedigrees that are simulated according to specified mating schemes and averaging the results, which would still require less number of replicates and less computational time than simulation studies. The second important application of the two recursive algorithms is to provide an appropriate way of incorporating pedigree information for analyzing genotypic data in bi- or multiparental populations. Specifically, the two recursive algorithm can be used to calculate the process parameter values of hidden Markov models for genotypic data, that is, the prior probability distribution of ancestral origins (FGLs) at an initial site and the prior transition probability matrix describing how ancestral origins change along two homologous chromosomes within an individual. See Figure 1 of Zheng (2015) for an example. The application to the ancestral inference in the CC shows that the new algorithms implemented in RABBIT performs only slightly better than GAIN (Figure 8). However, GAIN is specifically designed for the CC, and our previous algorithm applies only to breeding schemes with stage-wise random mating. The new recursive algorithms have pronounced advantages of generality and computational efficiency, and they apply to arbitrary breeding pedigrees that are far away from random mating. For example, one of the multiparental barley populations consists of backcrossing, half-diallel crossing, and selfing, where one inbred founder is much stronger represented in the offspring lines (Liller ). Evaluation of the recursive algorithms that are applied in the ancestral inference for the first pair of autosomes of the 120 real CC lines. Panels (A-C) show the results for the wrongly assigned probability, the wrongly called probability, and the pedigree inconsistency, respectively. The pedigree inconsistency is measured by the sum of the posterior probabilities over the four mating pairs of founder parents. Furthermore, we have applied the two recursive algorithms to incorporate pedigree information for genotype imputation in multiparental populations (Zheng ), and the application for genetic linkage map construction in multiparental populations is under development.

39 in total

1. Distribution of genome shared identical by descent by two individuals in grandparent-type relationship.

Authors: V T Stefanov
Journal: Genetics Date: 2000-11 Impact factor: 4.562

2. Mixed effects models for quantitative trait loci mapping with inbred strains.

Authors: Lara E Bauman; Janet S Sinsheimer; Eric M Sobel; Kenneth Lange
Journal: Genetics Date: 2008-09-14 Impact factor: 4.562

3. Distribution of genome shared IBD by half-sibs: approximation by the Poisson clumping heuristic.

Authors: H Bickeböller; E A Thompson
Journal: Theor Popul Biol Date: 1996-08 Impact factor: 1.570

4. Potential of a tomato MAGIC population to decipher the genetic control of quantitative traits and detect causal variants in the resequencing era.

Authors: Laura Pascual; Nelly Desplat; Bevan E Huang; Aurore Desgroux; Laure Bruguier; Jean-Paul Bouchet; Quang H Le; Betty Chauchard; Philippe Verschave; Mathilde Causse
Journal: Plant Biotechnol J Date: 2014-11-07 Impact factor: 9.803

5. Accurate Genotype Imputation in Multiparental Populations from Low-Coverage Sequence.

Authors: Chaozhi Zheng; Martin P Boer; Fred A van Eeuwijk
Journal: Genetics Date: 2018-07-25 Impact factor: 4.562

6. Pedigree mating with two linked loci.

Authors: B S Weir; C C Cockerham
Journal: Genetics Date: 1969-04 Impact factor: 4.562

7. On the computation of inbreeding coefficients.

Authors: J Rostron
Journal: Ann Hum Genet Date: 1978-05 Impact factor: 1.670

8. Genotype probabilities at intermediate generations in the construction of recombinant inbred lines.

Authors: Karl W Broman
Journal: Genetics Date: 2012-02 Impact factor: 4.562

9. Construction of high-quality recombination maps with low-coverage genomic sequencing for joint linkage analysis in maize.

Authors: Chunhui Li; Yongxiang Li; Peter J Bradbury; Xun Wu; Yunsu Shi; Yanchun Song; Dengfeng Zhang; Eli Rodgers-Melnick; Edward S Buckler; Zhiwu Zhang; Yu Li; Tianyu Wang
Journal: BMC Biol Date: 2015-09-21 Impact factor: 7.431

10. An eight-parent multiparent advanced generation inter-cross population for winter-sown wheat: creation, properties, and validation.

Authors: Ian J Mackay; Pauline Bansept-Basler; Toby Barber; Alison R Bentley; James Cockram; Nick Gosman; Andy J Greenland; Richard Horsnell; Rhian Howells; Donal M O'Sullivan; Gemma A Rose; Phil J Howell
Journal: G3 (Bethesda) Date: 2014-09-18 Impact factor: 3.154

4 in total