Chin Jian Yang1, Rodney N Edmondson2, Hans-Peter Piepho3, Wayne Powell1, Ian Mackay1,4. 1. Scotland's Rural College (SRUC), Edinburgh EH9 3JG, UK. 2. Rana House, Wellesbourne, Warwick CV35 9LS, UK. 3. Biostatistics Unit, Institute of Crop Science, University of Hohenheim, Stuttgart 70593, Germany. 4. IMplant Consultancy Ltd., Chelmsford, UK.
Abstract
Multiparental Advanced Generation Inter-Cross (MAGIC) populations are valuable crop resources with a wide array of research uses including genetic mapping of complex traits, management of genetic resources and breeding of new varieties. Multiple founders are crossed to create a rich mosaic of highly recombined founder genomes in the MAGIC recombinant inbred lines (RILs). Many variations of MAGIC population designs exist; however, a large proportion of the currently available populations have been created empirically and based on similar designs. In our evaluations of five MAGIC populations, we found that the choice of designs has a large impact on the recombination landscape in the RILs. The most popular design used in many MAGIC populations has been shown to have a bias in recombinant haplotypes and low level of unique recombinant haplotypes, and therefore is not recommended. To address this problem and provide a remedy for the future, we have developed the "magicdesign" R package for creating and testing any MAGIC population design via simulation. A Shiny app version of the package is available as well. Our "magicdesign" package provides a unifying tool and a framework for creativity and innovation in MAGIC population designs. For example, using this package, we demonstrate that MAGIC population designs can be found which are very effective in creating haplotype diversity without the requirement for very large crossing programs. Furthermore, we show that interspersing cycles of crossing with cycles of selfing is effective in increasing haplotype diversity. These approaches are applicable in species that are hard to cross or in which resources are limited.
Multiparental Advanced Generation Inter-Cross (MAGIC) populations are valuable crop resources with a wide array of research uses including genetic mapping of complex traits, management of genetic resources and breeding of new varieties. Multiple founders are crossed to create a rich mosaic of highly recombined founder genomes in the MAGIC recombinant inbred lines (RILs). Many variations of MAGIC population designs exist; however, a large proportion of the currently available populations have been created empirically and based on similar designs. In our evaluations of five MAGIC populations, we found that the choice of designs has a large impact on the recombination landscape in the RILs. The most popular design used in many MAGIC populations has been shown to have a bias in recombinant haplotypes and low level of unique recombinant haplotypes, and therefore is not recommended. To address this problem and provide a remedy for the future, we have developed the "magicdesign" R package for creating and testing any MAGIC population design via simulation. A Shiny app version of the package is available as well. Our "magicdesign" package provides a unifying tool and a framework for creativity and innovation in MAGIC population designs. For example, using this package, we demonstrate that MAGIC population designs can be found which are very effective in creating haplotype diversity without the requirement for very large crossing programs. Furthermore, we show that interspersing cycles of crossing with cycles of selfing is effective in increasing haplotype diversity. These approaches are applicable in species that are hard to cross or in which resources are limited.
The Multiparental Advanced Generation Inter-Cross (MAGIC) population was initially proposed in crops by Mackay and Powell (2007) as a highly recombined population derived from multiple founders. The MAGIC term is largely relevant in plants; however, the concept was derived from the mapping approach using genetically heterogeneous stock in mice (Mott ) and is very close to the Collaborative Cross (CC) population in mice (Churchill ). The first MAGIC population was produced using 19 founders in Arabidopsis thaliana (Kover ). The MAGIC pedigree described by Cavanagh has served as a foundation for the design of many MAGIC populations in subsequent years. Briefly, the MAGIC pedigree shows a single funnel going from eight founders to a recombinant inbred line (RIL). Starting with eight founders labeled as A to H, two-way crosses are made as (A × B), (C × D), (E × F), and (G × H). Next, four-way crosses are made as [(A × B) × (C × D)] and [(E × F) × (G × H)]. Finally, eight-way crosses are made as (((A × B) × (C × D)) × ((E × F) × (G × H))) followed by several generations of selfing. Using this crossing scheme, the end of the funnel is a RIL with its genome composed of contributions from all eight founders. Alternatively, a MAGIC population design may involve multiple funnels like the elite wheat MAGIC population by Mackay . Regardless of the designs, MAGIC RILs have diverse recombination landscape and rich mosaics of founder genomes (Scott ).Over the years, MAGIC populations have been used in various studies with great success. MAGIC populations are popular choices in mapping quantitative trait loci (QTLs) due to their high mapping power and resolution (Valdar ), for examples resistance QTLs in bread wheat (Stadlmeier ), cold tolerance QTLs in maize (Yi ), and high-throughput phenotype QTLs in rice (Ogawa ). In addition to single-trait analyses, multivariate analyses (multi-trait or -environment) have been demonstrated in MAGIC populations (Scutari ; Verbyla ). Diouf used a tomato MAGIC population to dissect the underlying genetic-by-environment (G × E) and plasticity for climate adaptation traits. Aside from QTL mapping, MAGIC populations are valuable resources for genomic selection owing to their properties of highly recombined genomes and large population size (Scott ). Following that, there are opportunities for using MAGIC RILs in breeding new varieties (Bandillo ; Li ). With large numbers of founders, MAGIC populations also provide a dynamic asset for the management of genetic resources (Thépot ) and may be used to improve our understanding of crop adaptation (Scott ). Given their longevity with a broad array of uses, MAGIC populations are an invaluable community resource for creative and impactful research.Considering the importance of MAGIC populations in crop research, several previous studies have explored their designs. Ladejobi investigated a genetic diversity-based approach in founder selection and compared the distributions of recombinant haplotypes within small interval for several MAGIC population designs. Zheng calculated recombination densities in several MPP designs and showed that higher recombination densities can be achieved by increasing the number of crossing generations. Similar work in mice also showed that additional crossing and maintenance generations increase the number of recombinations in RILs (Valdar ). Other simulation studies that compared several MPP designs suggest that MAGIC-like populations are better for generating high number of recombinations, and smaller nonrecombining genomic bin region (Rockman and Kruglyak 2008; Klasen ). The development of MAGIC populations has been summarized recently in reviews by Huang , Arrones , and Scott .Many of the past works have resulted in software for simulating and identifying recombinations, as well as calculating QTL mapping power and resolution. However, there is none dedicated for the design and test of novel MAGIC crossing schemes. The “GA” R package and GeneDrop software (Ladejobi ) can be used to calculate founder genetic diversity and simulate recombinations, but neither is capable of producing a MAGIC pedigree. The RABBIT software (Zheng ), similar to its counterparts HAPPY (Mott 2008), “mpMap” R package (Huang and George 2011), and “qtl2” R package (Broman 2019), can be used to determine founder genotypes and thus identify recombination breakpoints in MAGIC, but it still does not create a MAGIC pedigree. Given that MAGIC has applications beyond QTL mapping, it would be beneficial to explore novel crossing schemes that require minimum effort to construct populations suited for wide array of uses.Here, we have sought to understand the relationship between MAGIC population designs and population recombination landscape. We selected and analyzed five MAGIC populations with publicly available marker genotypes for the founders and RILs, genetic map positions and pedigrees. The selected populations comprise the UK wheat eight-founder (Mackay ), German wheat 8-founder (Sannemann ), cowpea eight-founder (Huynh ), tomato eight-founder (Pascual ), and UK wheat 16-founder (Scott ) MAGIC populations. These MAGIC populations were created from different designs. We contrasted the observed recombinant haplotypes to expected (simulated) recombinant haplotypes in each population. A comparable cross-population analysis can be challenging due to many variables like genome sizes, marker genotyping platforms and numbers of founders. Fortunately, there are two elite wheat eight-founder MAGIC populations (Mackay ; Sannemann ) that were genotyped with the same 90k SNP array (Wang ), which allowed us to directly compare the two populations in greater depth. We found that the recombination landscape varies across designs and that this variation is consistent across species.Following our results, we have created the “magicdesign” package in R Core Team (2021) for the purpose of creating and testing different MAGIC population designs. Three major steps are involved in the package pipeline: design creation, population simulation and comparative analysis. Users can create a design by either specifying input variables or providing a custom pedigree. Once a design is created, “magicdesign” converts it into a crossing scheme from the founders to final RILs and simulates a population based on the crossing scheme. After multiple iterations of simulation, distributions of recombinant haplotypes and founder genomes are summarized. Results from one or more designs can be combined and compared visually in plots. In addition, “magicdesign” produces a pedigree in both text and plot formats that can be used as a guide to support crossing work in practice. Aside from the described roles, “magicdesign” serves as a tool to advance the use of MAGIC in future multiparental populations.
Materials and methods
Evaluation of MAGIC population designs
We surveyed all available MAGIC populations that have been published to date (including pre-prints) and identified five populations with publicly available marker data. These five MAGIC populations include wheat with eight UK elite founders (Mackay ), wheat with eight German elite founders (Sannemann ), cowpea with eight founders (Huynh ), tomato with eight founders (Pascual ), and wheat with 16 UK diverse founders (Scott ). These populations are referred to as wheat-UK8, wheat-DE8, cowpea, tomato, and wheat-UK16, respectively (Table 1). These datasets were chosen because the marker data for the founders and RILs, genetic map positions, and pedigree are publicly available. The wheat-UK16 population is an exception as it has founder dosages to compensate for the lack of genetic map positions. Links to the source datasets are listed in the Data Availability section. Other publicly available datasets like the wheat with eight German founders (Stadlmeier ), wheat with eight Australian founders (Shah ), rice with eight founders (Raghavan ), maize with nine founders (Dell’Acqua ), and Arabidopsis with 19 founders (Kover ) are excluded because at least one component of the data needed for our purpose is not present.
Table 1
Summary of five analyzed MAGIC populations
Dataset
n
Genome (cM)
Marker
Distance (cM/marker)
PRHR
Reference
Wheat-UK8
8
5,262
5,138
1.024
0.296 ± 0.094
Mackay et al. (2014)
Wheat-DE8
8
5,262
5,138
1.024
0.331 ± 0.058
Sannemann et al. (2018)
Cowpea
8
979
32,114
0.030
0.674 ± 0.207
Huynh et al. (2018)
Tomato
8
2,156
1,345
1.603
0.410 ± 0.106
Pascual et al. (2015)
Wheat-UK16
16
5,262
1,065,178
0.005
0.799 ± 0.092
Scott et al. (2021)
The wheat-UK8 and wheat-DE8 datasets have been reduced to share the same markers and maps for comparison. The PRHR is calculated as number of recombinant haplotypes in actual dataset divided by number of recombinant haplotypes in simulated dataset. PRHR is shown as mean ± standard deviation.
Summary of five analyzed MAGIC populationsThe wheat-UK8 and wheat-DE8 datasets have been reduced to share the same markers and maps for comparison. The PRHR is calculated as number of recombinant haplotypes in actual dataset divided by number of recombinant haplotypes in simulated dataset. PRHR is shown as mean ± standard deviation.All five chosen MAGIC populations vary in numbers of RILs and marker density (Table 1). The original wheat-UK8 dataset is made of 643 RILs and 18,599 markers while the original wheat-DE8 dataset is made of 910 RILs and 7579 markers. To maintain a fair comparison between these two populations, we kept only 5138 markers that are common between wheat-UK8 and wheat-DE8. In wheat-DE8, missing data were previously imputed to numerical mean (twice the allele frequency). These imputed marker data cannot be used in “qtl2” (Broman ) for calculating founder genotype probabilities, so we reverted the imputed marker data by converting any noninteger marker data to missing. The cowpea dataset is made of 305 RILs and 32,114 markers after removing 16 markers in the original dataset where the marker data is missing in at least one founder. The tomato dataset is made of 238 RILs and 1345 markers. The wheat-UK16 dataset is made of 504 RILs and 1,065,178 markers.
Identification of recombinant haplotypes in MAGIC populations
To identify recombinant haplotypes, the biallelic marker data in the RILs need to be converted into founder genotypes. For each dataset except wheat-UK16, we determined the founder genotypes in each RIL using the “qtl2” package (Broman ) in R (R Core Team 2021). We first calculated the founder genotype probabilities using calc.genoprob function with error probability of 0.01 (1%) and Haldane map function. Next, we inferred the founder genotypes from the probabilities using maxmarg function with minimum probability of 0.5001. We chose a slightly higher threshold than the previously used minimum probability of 0.5 by Gardner . Because the genotype probabilities for all founders at each RIL’s marker sum to 1, the threshold of 0.5001 eliminates the risk of the maxmarg function picking a founder genotype at random when there are two or more with the same probability above the threshold. For the wheat-UK16 dataset, the founder genotype dosages are readily available. These founder genotype dosages were calculated from STITCH (Davies ), which is a different software but uses the same underlying hidden Markov model (HMM) as “qtl2.” We inferred the founder genotypes in the wheat-UK16 dataset using an equivalent threshold of 1 because the estimated genotype dosages sum to 2. Markers without any founder genotype probabilities above the threshold were set to missing.Using the inferred founder genotypes in each dataset, we identified the recombinant haplotypes at each breakpoint. The recombinant haplotype is a combination of flanking founder genotypes at each breakpoint. For example, 1_5 is the recombinant haplotype at a breakpoint where the two flanking founder genotypes are founder 1 and 5. For a population with founders, there are individual recombinant haplotypes. Therefore, in an 8-founder MAGIC population, there are 56 individual recombinant haplotypes (1_2, 1_3, …, 8_6, 8_7). In order to summarize the results, we summed the counts of each individual recombinant haplotype in every RIL, and averaging the counts across all RILs to obtain the mean counts of individual recombinant haplotypes.As a control, we simulated a similar MAGIC population based on the original pedigree for each dataset and calculated the true counts of recombinant haplotypes. We first derived an approximated crossing scheme from the pedigree. Because the information on replicates is not always present in the pedigree, we assumed that no replicates and considered all funnels to be independent. Next, we used the “AlphaSimR” package (Gaynor ) in R (R Core Team 2021) to simulate the MAGIC populations for a total of 100 iterations. “AlphaSimR” is a package designed for simulating plant and animal breeding programs, and we used a fraction of its functionality to simulate crosses and recombinations. In order to keep track of founder genotypes, we expanded each marker into markers with the same exact genetic map position to prevent recombination among these markers. Using founders as example, the founders are coded as 000, 100, 010, and 001 across the three expanded markers. This expanded marker system tracked the true founder genotypes from the start to the end of simulation, and therefore allowed us to calculate the true counts of recombinant haplotypes using the same method described in the previous paragraph. In all datasets except for wheat-UK16, we used the same genetic map positions as the actual datasets. In wheat-UK16, we used equally spaced markers at 0.5 cM since the genetic map positions were not available for this dataset.We included an additional control using a hybrid approach of actual and simulated datasets. Specifically, we converted the founder genotypes in simulated RILs into biallelic marker data and inferred the founder genotypes using the same procedures as we did in the actual datasets. This approach was applied to all four MAGIC datasets except wheat-UK16. The counts of recombinant haplotypes identified from this approach provide an upper limit to the inferred number of founder genotypes using “qtl2” (Broman ). There is a caveat, however, that since our simulation is based on an approximated crossing scheme, the outcomes of this approach may not precisely represent the upper limit.
Determination of unique or identical recombinant haplotypes
Following the identified recombinant haplotypes in wheat-UK8 and wheat-DE8, we classified these recombinant haplotypes into unique or identical groups based on their positional overlaps within an interval. Recombinant haplotypes of the same founder pairs are considered identical if they fall within the same interval, otherwise unique if they do not overlap. The intervals are arranged in nonoverlapping bins of approximately 1 or 10 cM from the start to end of a chromosome. Identical recombinant haplotypes exist due to replications of cross progeny in the MAGIC population. While this classification cannot distinguish between identical and independent recombinant haplotypes within the interval, the probability of independent recombinant haplotypes is low and assumed equal between wheat-UK8 and wheat-DE8. Therefore, the results from this comparison can elucidate the effect of MAGIC population designs on the proportions of unique against identical recombinant haplotypes.
Minimum probability in calling founder genotypes
The minimum probability used in calling founder genotypes determines the power in identifying the correct founder genotypes in the RILs. We selected 10 thresholds ranging from 0.1 to 1.0 with an increment of 0.1. We applied each threshold to the maxmarg function in “qtl2” (Broman ) in simulated populations based on wheat-UK8 and wheat-DE8. These simulated populations are similar to the previously described hybrid approach where the simulated founder genotypes in RILs are converted to biallelic markers prior to calculating genotype probabilities. For each threshold, we computed the proportions of correct, incorrect or missing founder genotypes by comparing the inferred to true founder genotypes. The ideal threshold should have a high proportion of correct founder genotypes with low proportions of incorrect and missing founder genotypes.
Marker density in MAGIC population
We used the proportion of recombinant haplotype recovered (PRHR) to quantify the recombinations in MAGIC population that is captured in the marker data. We calculated the empirical PRHR in all five datasets as the counts of recombinant haplotypes in the actual dataset divided by the counts in the simulated dataset. Because the marker density is not constant along the genomes in actual datasets, we sought to determine a clearer relationship between PRHR and marker density via simulation. We simulated a single chromosome of 200 cM with eight founders crossed using the same design as wheat-UK8. The founder alleles were simulated based on the correlations among founders in wheat-UK8 using the rmvbin function in “bindata” package (Leisch ) in R (R Core Team 2021). We simulated a total of 4000 markers that are equally spaced at 0.05 cM. To test lower marker densities, we thinned the same simulated marker data to 0.10, 0.20, 0.40, 0.80, 1.60, 3.20, 6.40, 12.80 cM, respectively. For each marker density, we inferred the founder genotypes in the RILs using “qtl2” (Broman ) at a probability threshold of 0.5001 and calculated the counts of recombinant haplotypes. Finally, we obtained the PRHR by taking the counts of recombinant haplotypes for each marker density divided by the true counts.
Results
Classifications of MAGIC population designs
Variations in the MAGIC population designs can be described by the number of founders and the crossing scheme (Figure 1A). It is convenient to first consider two classes of designs based on the number of founders: power of two (P2) and nonpower of two (NP2). As the names suggest, the P2 class has founders for any whereas the NP2 class has and founders. P2 designs are generally easier to implement in practice because the numbers of individuals in a funnel are halved in every crossing generation. For either P2 or NP2 classes, the crossing scheme can be structured, unstructured and semi-structured (Figure 1A). A structured design involves strictly defined crosses among the founders and intermediates such that the crossing scheme can be further classified into full, partial balanced, partial unbalanced or basic designs. These designs are elaborated further in subsequent paragraphs. On the other hand, an unstructured design involves random crosses among the founders and intermediates, while a semi-structured design is a combination of structured and unstructured designs. Additional features of a structured design include: (1) precise tracing of the ancestry of each RIL back to its progenitors, (2) number of crossing generations is equal to rounded up to the nearest integer. These features may not hold true in unstructured or semi-structured designs.
Figure 1
Classifications of MAGIC population designs. (A) Flowchart of classifying MAGIC population designs based on their crossing schemes. (B) Distribution of MAGIC population designs in all 48 surveyed populations.
Classifications of MAGIC population designs. (A) Flowchart of classifying MAGIC population designs based on their crossing schemes. (B) Distribution of MAGIC population designs in all 48 surveyed populations.Within a structured design, there are two primary types based on the number of funnels: full and partial (Figure 1A). A full P2 design has funnels while a partial P2 design has one or more funnels but less than funnels. The numerator is the total number of permutations of 1 to founders, and the denominator is the total number of equivalent permutations by the MAGIC definition. Directions of crosses are disregarded in defining a funnel. The denominator can be described as for . In an example with four founders, 1234, 1243, 2134, 2143, 3412, 3421, 4312, and 4321 are all equivalent funnels. For simplicity, [(1 × 2) × (3 × 4)] is written as 1234. Full and partial types exist in NP2 designs although the number of funnels in a full design cannot be generalized similarly (Supplementary Table S1). From a practical perspective, a full P2 design is achievable for four or eight founders but not for 16 or more founders as the number of required funnels becomes unmanageable.Within a partial design, the funnels can be chosen in either a balanced or an unbalanced way (Figure 1A). A balanced design has an equal number of founders among the funnels and equal frequency of founder pairs at each crossing generation. In a four-founder MAGIC design, 1234, 1324, and 1423 form a set of three balanced funnels. First, each founder occurs thrice in the set of funnels. Second, each founder is paired once with another founder in the two-way crosses, and twice in the four-way crosses. For example, founder 1 meets founder 2 once in the two-way cross (first funnel) and twice in the four-way cross (second and third funnels). Coincidentally, since and , the set of three balanced funnels is equivalent to a full design for four founders. Unlike the partial balanced design where the number of funnels is restricted to set rules, the partial unbalanced design is formed by funnels chosen randomly. Differences between balanced and unbalanced designs are explored in a later section. In addition, we coin the special case of partial unbalanced design with one funnel as a basic design. Examples of all of the designs are shown in Supplementary Figure S1.Based on our survey of 48 MAGIC populations in 15 crop species that have been described in either published or pre-print literature to date, there are 39 P2 and 9 NP2 designs (Figure 1B and Supplementary Table S2). The numbers of founders range from four to 60, with four and eight founders as the predominant numbers. Despite the ease of handling required for crosses based on a full design with four founders, all 10 of the populations were created using a basic design. Of the 26 MAGIC populations with eight founders, there are 16 basic designs, nine partial designs and one semi-structured design. The popularity of the basic design can be ascribed to Cavanagh , who provided an illustrated pedigree of a basic design. We refrained from classifying the partial designs into balanced and unbalanced designs due to the lack of pedigree information in many MAGIC populations. Regardless of the number of founders, there has not been any MAGIC population created with a full design. There are several eight-founder populations that came close to a full design. The bread wheat MAGIC population by Mackay had 210 out of 315 required funnels for a full design. The maize MAGIC population by Dell’Acqua had mixed funnels from pooling different four-way individuals and had to introduce an additional founder due to a failed two-way cross. The three bread wheat MAGIC populations by Shah came closest to a full design with 311–313 funnels.
Empirical evaluation of two bread wheat MAGIC populations
Our evaluation on two bread wheat MAGIC populations derived from distinct sets of eight elite founders shows that the distributions of recombinant haplotypes differ for each MAGIC design (Figure 2 and Supplementary Figure S2). We used the wheat-UK8 and wheat-DE8 populations, in which wheat-UK8 is an example of a partial design while wheat-DE8 is an example of a basic design (Supplementary Table S2). To maintain our cross-population comparison as fair as possible, we reduced the original wheat-UK8 and wheat-DE8 datasets to smaller subsets with common markers (Figure 2), although the same analysis was performed on the original datasets too (Supplementary Figure S2). The subsets include all 643 RILs in wheat-UK8 and 910 RILs in wheat-DE8, and 5138 common markers arranged in the same genetic map positions as Gardner . This genetic map is chosen over the original genetic map in the wheat 90k array (Wang ) because of higher map quality.
Figure 2
Distributions of recombinant haplotypes in two wheat MAGIC populations. (A) Plot shows mean count of each recombinant haplotype in a single RIL in wheat-UK8. The boxplot shows mean count from true founder genotypes (100 simulated iterations). The red and blue points show mean count from inferred founder genotypes. (B) Plot shows mean count of each recombinant haplotype in a single RIL in wheat-DE8. (C) Histogram of the mean count in wheat-UK8. (D) Histogram of the mean count in wheat-DE8.
Distributions of recombinant haplotypes in two wheat MAGIC populations. (A) Plot shows mean count of each recombinant haplotype in a single RIL in wheat-UK8. The boxplot shows mean count from true founder genotypes (100 simulated iterations). The red and blue points show mean count from inferred founder genotypes. (B) Plot shows mean count of each recombinant haplotype in a single RIL in wheat-DE8. (C) Histogram of the mean count in wheat-UK8. (D) Histogram of the mean count in wheat-DE8.The distribution of all recombinant haplotypes is less skewed in wheat-UK8 than in wheat-DE8 (Figure 2). In wheat-UK8, none of the recombinant haplotype appears more frequently than others (Figure 2A). In any given RIL, there are 0.879 ± 0.227 (mean ± standard deviation) individual recombinant haplotypes. In wheat-DE8, eight recombinant haplotypes appear about twice as frequently as the others (Figure 2B). There are 1.910 ± 0.313 of these eight recombinant haplotypes (1_2, 2_1, 3_4, 4_3, 5_6, 6_5, 7_8, and 8_7) instead of 0.845 ± 0.150 of the other recombinant haplotypes. In addition, the mean count of recombinant haplotype is approximately normally distributed in wheat-UK8 (Figure 2C) but is skewed to the right in wheat-DE8 (Figure 2D). The eight skewed recombinant haplotypes match with all of the founder pairs in two-way crosses in wheat-DE8. This is not a coincidence because two-way crosses have the largest founder genomes to recombine. With every generation of crosses, the founder genomes are halved and so there are fewer recombinations between any two founders. Examples of the detrimental consequences of the skew in recombinant haplotypes are: (1) reduction in QTL mapping power and resolution when the pairs of founders with higher skew carry the same haplotypes surrounding the causative QTL, (2) limited novel haplotypes for breeding use.While wheat-UK8 has a slightly lower number of recombinant haplotypes per RIL than wheat-DE8 in both reduced (Table 2) and full (Supplementary Tables S3 and S4) datasets, the proportion of unique recombinant haplotypes is higher in wheat-UK8 than in wheat-DE8 (Figure 3 and Supplementary Figure S3). Due to the imprecision of inferred recombination breakpoints, we defined recombinant haplotypes with breakpoints within any nonoverlapping intervals as identical. We chose the intervals to be 1 and 10 cM wide. With the interval width set to 1 cM, there are 17,786 distinct recombinant haplotypes distributed among 643 RILs in wheat-UK8, which is equivalent to 27.66 distinct recombinant haplotypes per RIL. Similarly, there are 17,643 distinct recombinant haplotypes distributed among 910 RILs in wheat-DE8, which is equivalent to 19.39 distinct recombinant haplotypes per RIL. When the interval is set to 10 cM, the counts and proportions of unique recombinant haplotypes decrease and the differences between wheat-UK8 and wheat-DE8 holds (Supplementary Figure S3). There are many practical implications of having more unique recombinant haplotypes: (1) increased mapping power and resolution, (2) increased options of novel haplotypes for breeding, and (3) minimized redundancy of the same recombinant haplotypes. For any pair of founders that share a haplotype carrying linked QTLs, all recombinations between the two founders within the region are noninformative. Such haplotype can be broken down by recombinations between other pairs of founders, which is achievable by having more unique recombinant haplotypes. This is useful to avoid mapping ghost QTLs For example, a previously identified AOP2/AOP3 locus in Arabidopsis (Atwell ; Kerwin ) was recently re-mapped to two other linked loci, NDX1, and GA1 (Sasaki ).
Table 2
Number of informative recombinations in wheat-UK8 and wheat-DE8
Chr
NR/RIL
NR/RIL/M
Wheat-UK8
Wheat-DE8
Wheat-UK8
Wheat-DE8
Sim
Actual
Sim
Actual
Sim
Actual
Sim
Actual
1A
7.21
1.91
7.21
2.35
3.13
0.83
3.13
1.02
1B
11.77
3.57
11.76
4.31
3.44
1.04
3.44
1.26
1D
3.74
0.68
3.73
0.99
2.92
0.53
2.92
0.78
2A
8.37
3.67
8.37
3.17
3.32
1.45
3.32
1.26
2B
12.43
2.63
12.43
3.61
3.34
0.71
3.34
0.97
2D
4.97
1.69
4.96
1.42
2.71
0.92
2.71
0.77
3A
10.21
4.50
10.19
4.16
3.37
1.48
3.36
1.37
3B
9.89
3.35
9.90
4.18
3.51
1.19
3.51
1.48
3D
4.45
0.61
4.44
0.45
2.29
0.32
2.28
0.23
4A
6.95
2.51
6.96
2.11
3.30
1.19
3.30
1.00
4B
7.55
2.21
7.58
2.23
3.40
0.99
3.41
1.01
4D
2.95
0.51
2.96
0.43
2.75
0.47
2.75
0.40
5A
10.23
4.15
10.23
4.47
3.26
1.32
3.26
1.43
5B
10.97
2.59
10.96
3.25
3.53
0.83
3.53
1.05
5D
4.76
1.33
4.75
1.52
2.39
0.67
2.39
0.76
6A
9.70
3.38
9.68
4.48
3.48
1.21
3.47
1.61
6B
8.85
1.97
8.84
2.46
3.40
0.76
3.40
0.95
6D
3.92
0.43
3.93
1.14
1.82
0.20
1.83
0.53
7A
13.53
4.21
13.50
5.07
3.53
1.10
3.52
1.32
7B
9.28
2.64
9.27
2.97
3.23
0.92
3.23
1.03
7D
4.40
0.74
4.40
1.05
2.36
0.40
2.35
0.56
All
166.10
49.24
166.05
55.84
3.16
0.94
3.16
1.06
The number of informative recombinations (NR) is calculated for both simulated and actual wheat-UK8 and wheat-DE8 datasets. Note: recombinant inbred line (RIL), Morgan (M).
Figure 3
Distributions of unique and identical recombinant haplotypes in two wheat MAGIC populations. Recombinant haplotypes are considered identical if they are of the same founder pairs and present in the same 1 cM interval, otherwise unique. (A) Counts of the number of identical recombinant haplotypes in wheat-UK8. The left most point is the count of unique recombinant haplotypes. (B) Counts of the number of identical recombinant haplotypes in wheat-DE8. (C) Proportions of unique and nonunique (identical) recombinant haplotypes in wheat-UK8 and wheat-DE8.
Distributions of unique and identical recombinant haplotypes in two wheat MAGIC populations. Recombinant haplotypes are considered identical if they are of the same founder pairs and present in the same 1 cM interval, otherwise unique. (A) Counts of the number of identical recombinant haplotypes in wheat-UK8. The left most point is the count of unique recombinant haplotypes. (B) Counts of the number of identical recombinant haplotypes in wheat-DE8. (C) Proportions of unique and nonunique (identical) recombinant haplotypes in wheat-UK8 and wheat-DE8.Number of informative recombinations in wheat-UK8 and wheat-DE8The number of informative recombinations (NR) is calculated for both simulated and actual wheat-UK8 and wheat-DE8 datasets. Note: recombinant inbred line (RIL), Morgan (M).
Empirical evaluation of three other MAGIC populations
While not directly comparable, the relationship between MAGIC population designs and the distributions of recombinant haplotypes in three other datasets remains consistent. Similar to wheat-DE8, the cowpea and tomato MAGIC populations were created from a basic design and thus have a skewed distribution of recombinant haplotypes (Figure 4). The recombinant haplotypes from two-way founder pairs are higher than the other recombinant haplotypes. In cowpea, the two-way recombinant haplotypes are 0.936 ± 0.171 (mean ± standard deviation) per RIL while the other recombinant haplotypes are 0.384 ± 0.123 per RIL. In tomato, the two-way recombinant haplotypes are 0.907 ± 0.095 per RIL and the other recombinant haplotypes are 0.416 ± 0.116 per RIL. On the other hand, wheat-UK16 was created from a partial balanced design and does not have any skew in its distribution of recombinant haplotypes (Supplementary Figure S4). The recombinant haplotypes are 0.878 ± 0.102 per RIL.
Figure 4
Distributions of recombinant haplotypes in cowpea and tomato MAGIC populations. (A) Plot shows mean count of each recombinant haplotype in a single RIL in cowpea. The boxplot shows mean count from true founder genotypes (100 simulated iterations). The red and blue points show mean count from inferred founder genotypes. (B) Plot shows mean count of each recombinant haplotype in a single RIL in tomato.
Distributions of recombinant haplotypes in cowpea and tomato MAGIC populations. (A) Plot shows mean count of each recombinant haplotype in a single RIL in cowpea. The boxplot shows mean count from true founder genotypes (100 simulated iterations). The red and blue points show mean count from inferred founder genotypes. (B) Plot shows mean count of each recombinant haplotype in a single RIL in tomato.
Minimum probability for calling founder genotype
The minimum probability for calling founder genotype is important for the identification of recombinant haplotypes, and our simulation results suggest that the range of 0.4–0.6 gives a good balance of correct, incorrect and missing founder genotype calls (Figure 5, A and B). This range is in accordance to the threshold of 0.5 used in Gardner . The results are similar between simulated wheat-UK8 and wheat-DE8 populations, so only results from the simulated wheat-UK8 population are elaborated here. At a minimum probability of 0.4, the correct, incorrect and missing founder calls are 69%, 16%, and 15% of the total markers, respectively. At a minimum probability of 0.5, the rates are 64%, 11%, and 25%. At a minimum probability of 0.6, the rates are 58%, 6%, and 36%. As the minimum probability increases, the rates of correct and incorrect founder calls decrease while the missing rate increases. In order to avoid the issue of having two or more founder probabilities above the threshold, the minimum probability can be set to 0.5 or higher. Because the simulations are based on the available diversity among the wheat-UK8 and wheat-DE8 founders, the appropriate range of minimum probability for calling founder genotype may vary in other populations.
Figure 5
Ideal threshold for inferring founder genotypes and marker density in MAGIC population. (A) Proportions of correct, incorrect and missing founder genotypes inferred at different minimum probability (minprob) in simulated wheat-UK8 population. (B) Proportions of correct, incorrect and missing founder genotypes inferred at different minprob in simulated wheat-DE8 population. (C) Proportions of recombinant haplotypes recovered (PRHR) at different marker density along a simulated chromosome of 200 cM. The marker density is adjusted by having markers equally spaced at 0.05, 0.10, 0.20, 0.40, 0.80, 1.60, 3.20, 6.40, and 12.80 cM apart.
Ideal threshold for inferring founder genotypes and marker density in MAGIC population. (A) Proportions of correct, incorrect and missing founder genotypes inferred at different minimum probability (minprob) in simulated wheat-UK8 population. (B) Proportions of correct, incorrect and missing founder genotypes inferred at different minprob in simulated wheat-DE8 population. (C) Proportions of recombinant haplotypes recovered (PRHR) at different marker density along a simulated chromosome of 200 cM. The marker density is adjusted by having markers equally spaced at 0.05, 0.10, 0.20, 0.40, 0.80, 1.60, 3.20, 6.40, and 12.80 cM apart.In all five analyzed datasets, the proportion of recombinant haplotypes recovered (PRHR) is higher in populations genotyped at higher marker density (Table 1). PRHR is computed by taking the number of recombinant haplotypes in actual dataset divided by the true number of recombinant haplotypes in simulated dataset. Therefore, high PRHR ensures that fine-scale recombinations are captured and increases QTL mapping resolution. In wheat-UK8 and wheat-DE8 with an average marker distance of 1.024 cM, the PRHR is approximately one-third (Figure 2). Even though the tomato population is genotyped at a lower marker density with an average marker distance of 1.603 cM, the PRHR is higher than in the two wheat populations (Figure 4B). This is likely because the markers on the wheat D-genome are generally sparse due to its low diversity (Akhunov ). The cowpea population is genotyped at a high marker density with an average marker distance of 0.030 cM, in which the PRHR is approximately two-thirds (Figure 4A). Finally, wheat-UK16 is genotyped at the highest marker density of all analyzed datasets with an average marker distance of 0.005 cM, and it has the highest PRHR of almost 80% (Supplementary Figure S4). Marker density is an important factor in identifying fine-scale recombination breakpoints in MAGIC populations.Under ideal conditions, where the markers are evenly spaced, a marker distance of 0.20 cM or less between two adjacent markers is sufficient to achieve a PRHR of at least 90% (Figure 5C). We tested recombinant haplotype recovery rates for markers that are evenly spaced across 0.05, 0.10, 0.20, 0.40, 0.80, 1.60, 3.20, 6.40, and 12.80 cM. At the smallest tested distance of 0.05 cM, approximately 97% of the true recombinant haplotypes can be recovered. As the distance increases, the recovery rate decreases. At the largest tested distance of 12.80 cM, approximately 11% of the true recombinant haplotypes can be recovered. These results are more optimistic than the actual results (Table 1). In practice, more markers are required to achieve the same recovery rate for any given marker density because markers are not evenly distributed across the whole genome. In addition, the discrepancy between simulated and actual results can also be attributed to marker quality. For example, the markers on the wheat D-genomes are generally sparser than on the others.
magicdesign: a tool to create and test MAGIC population designs
Given that MAGIC population construction requires a lot of time and effort, and that design choices can impact population attributes, there is a need for a “free trial” before committing to create a MAGIC population. Here, we introduce an R package called “magicdesign,” which is specifically made for creating and testing various MAGIC population designs via simulation. Alternatively, we also provide a user-friendly Shiny app version called “magicdesignee” which implements the “magicdesign” R package in its back-end. Therefore, minimal R knowledge is required for users to use “magicdesignee.”Briefly, the “magicdesign” package workflow can be described as: (1) design creation, (2) population simulation, and (3) comparative analysis. In the design creation step, the package creates a crossing scheme that spans from the founders to the final RILs based on user inputs. In the population simulation step, the package simulates a MAGIC RIL population constructed from the crossing scheme, and repeats over multiple iterations. At this point, the first two steps may be repeated for other MAGIC population designs. Finally, in the comparative analysis step, the package extracts information from previously tested designs and summarizes the results illustratively. Additional details on each step are described in subsequent sections.
Design creation
In a structured design, the design creation step takes various user inputs to create a crossing scheme. The major inputs include number of founders, number of funnels or funnel sets, and a balanced design indicator. Based on how these inputs are specified, one of the structured designs (Full, Partial Balanced, Partial Unbalanced, and Basic) as shown in Figure 1A is created. As defined previously, a balanced design has an equal number of founders among the funnels and equal frequency of founder pairs at each crossing generation. This design creation step works for either power of 2 (P2) or nonpower of 2 (NP2) number of founders. Currently, the allowed range of number of founders is any integer between 3 and 128. The allowed number of funnels or funnel sets varies according to the number of founders and the balanced design indicator, and the full list is provided in Supplementary Table S1.Finding a balanced design requires more computation power than finding an unbalanced design. This is because the balanced design requires many funnel permutations to be evaluated while the unbalanced design randomly sample the required number of funnel permutations. To reduce the computational burden, we have identified alternative methods that are less computationally intensive. In the case of eight founders, we have searched through all 3157 possible combinations and identified 720 partial balanced funnel sets. There are seven funnels to make a minimum partial balanced funnel set and any of the 315 funnels from a full design can be chosen to fill each of the seven funnels in a funnel set. Furthermore, each of the partial balanced funnel set can be combined with another nonoverlapping partial balanced funnel set to form a larger partial balanced funnel set. In the case of 16 founders, the number of possible combinations is very large and so we opted for a different approach. To start, we obtained the 15 funnels from Scott , which is a partial balanced set for 16 founders. We searched through all 315 possible permutations of 8- and 16-way crosses in these funnels and identified 7776 partial balanced funnel sets. More partial balanced funnel sets could be found by searching through all 31515 possible permutations of four-way crosses, however, that was beyond our available computational capacity. Unlike the case of eight founders, these funnel sets do overlap and thus cannot be combined to form a larger set. Instead, by randomly swapping the founders from a starting partial balanced funnel set, a nonoverlapping partial balanced funnel set can be created and merged to form a larger set. For other numbers of founders between 4 and 16, the balanced design is created based on a nested incomplete block design (NIBD) generated using the “blocksdesign” package (Edmondson 2020, 2021). A MAGIC funnel is analogous to a NIBD as the founders in two-way crosses (experimental block of two plots) are nested within four-way crosses, founders in four-way crosses are nested within eight-way crosses, and so on. Currently, a balanced design in “magicdesign” is limited to 16 or less founders as there is not yet an efficient method for larger number of founders.In addition, “magicdesign” provides options to further modify the MAGIC population design by specifying the number of replicates, number of selfing generations, and an additional crossing indicator. The number of replicates determines how many seeds from a cross are retained. This can help to increase the haplotype diversity in the MAGIC population when the seeds are not genetically identical. In the case of inbred founders, replicates of two-ways individuals are all identical but not replicates of four-ways (or higher) individuals. The number of selfing generations determine how many generations of selfing are required after each cross. Typically, the selfing step is only applied after the last crossing generation as a way to reduce heterozygosity in the RILs. However, selfing prior to that may be beneficial in increasing recombinant haplotypes. Finally, the additional crossing indicator allows for an extra crossing generation to further increase recombinant haplotypes. This is similar to the approach taken by Stadlmeier and Shah .Alternatively, any MAGIC population design that is not available directly in “magicdesign” can be created by supplying a complete pedigree. The only requirement for the pedigree is that it must detail all crosses involved from the founders to the final RILs. This option provides a greater flexibility to accommodate for semi-structured or unstructured designs. Furthermore, it is also possible to modify a design created from “magicdesign” and provide the pedigree of the modified design.
Population simulation
Once a MAGIC population design is created, “magicdesign” simulates a population based on the design and other user inputs. The major inputs include distance between markers, chromosome genetic lengths, number of simulations and recombinant haplotype interval size. The simulation step will create evenly spaced markers based on the distance between markers and chromosome genetic lengths. All founders are considered unique and so each of these markers is used to encode for the founder genotypes. The desired number of simulations is selected. In addition, the recombinant haplotype interval size determines the distance between two markers to look for recombinant haplotypes.
Comparative analysis
After simulating one or more designs, the final step is to compare the design qualities in terms of recombinant haplotype proportions and distribution of founder genomes in the RILs. In general, a good MAGIC population design should yield consistently higher recombinant haplotype proportions as well as an even distribution of founder genomes compared to other designs.To demonstrate comparative analysis with “magicdesign,” the five designs in Table 3 are used as examples. These designs are all applied to a fictitious species with five chromosomes of 1.0, 1.5, 2.0, 2.5, 3.0 Morgans (M) length. All five designs are created based on a MAGIC population of eight founders. Design 1 is a full design and so it has all 315 funnels. Designs 2 and 4 are both partial balanced design with one funnel set (seven funnels), and the only difference between them is that the four-way individuals in design 4 are selfed once before making eight-way crosses. Design 3 is similar to design 2 except it is a partial unbalanced design with seven funnels. Finally, design 5 is a basic design with 1 funnel inspired by the design used in Stadlmeier . The numbers of replicates are varied for each design to achieve similar final RIL population size close to 1000. Aside from design 1 which has the highest number of crosses, the other designs have fairly similar numbers of crosses. Designs 1, 2, and 3 require seven generations from founders to RILs, while designs 4 and 5 require eight generations because of the additional selfing and crossing generation, respectively.
Table 3
Five MAGIC population designs tested in magicdesign
Design 1
Design 2
Design 3
Design 4
Design 5
Founders
8
8
8
8
8
Type
Full
Partial balanced
Partial unbalanced
Partial balanced
Basic
Replicates
1, 1, 3
1, 9, 15
1, 9, 15
1, 9, 15
1, 4, 4, 15
Selfing
0, 0, 4
0, 0, 4
0, 0, 4
0, 1, 4
0, 0, 0, 4
Crosses
28, 210, 315 (553)
28, 14, 63 (105)
19, 13, 63 (95)
28, 14, 63 (105)
4, 2, 4, 64 (74)
Generations
7
7
7
8
8
RIL
945
945
945
945
960
Funnel
315
7
7
7
1
The numbers of replicates, selfing generations and crosses are listed separately for each generation. For example, in design 1, the eight-way individuals are replicated three times and then selfed for four generations. The total number of crosses is shown in parentheses.
RIL, recombinant inbred line.
Five MAGIC population designs tested in magicdesignThe numbers of replicates, selfing generations and crosses are listed separately for each generation. For example, in design 1, the eight-way individuals are replicated three times and then selfed for four generations. The total number of crosses is shown in parentheses.RIL, recombinant inbred line.First, we investigated the designs’ effects on recombinant haplotypes within a 5 cM interval. In term of total recombinant haplotypes, a good design should have high mean with low variance. High mean implies a reduction in linkage disequilibrium (LD) and thus improves QTL mapping resolution (Ladejobi ) as well as prediction of marker effects in genomic prediction (GP). Low variance ensures that the proportion of recombinant haplotypes in the created MAGIC population remains close to the simulated mean and minimizes the risk of constructing a poorly recombined MAGIC population. For designs 1–5, respectively, the means are 0.167, 0.167, 0.169, 0.186, and 0.202 while the variances are 0.000158, 0.000244, 0.000293, 0.000452, and 0.003000 (Figure 6A). The means are similar in designs 1–3, slightly higher in design 4 and highest in design 5. However, the variances are lowest in design 1, similar in designs 2 and 3, slightly larger in design 4, and substantially larger in design 5.
Figure 6
Distributions of recombinant haplotypes in five MAGIC population designs. Recombinant haplotypes are evaluated within a 5 cM interval over 100 iterations of simulation. (A) Proportions of total recombinant haplotypes. (B) Number of unique recombinant haplotypes. (C) Proportions of six chosen recombinant haplotypes. Complete results are available in Supplementary Table S5.
Distributions of recombinant haplotypes in five MAGIC population designs. Recombinant haplotypes are evaluated within a 5 cM interval over 100 iterations of simulation. (A) Proportions of total recombinant haplotypes. (B) Number of unique recombinant haplotypes. (C) Proportions of six chosen recombinant haplotypes. Complete results are available in Supplementary Table S5.In any RIL derived from an eight-founder MAGIC population, there are 56 distinct recombinant haplotypes and a good design should have high mean with low variance. Mean number of unique recombinant haplotypes that approaches the theoretical maximum of is important for maximizing QTL mapping resolution and generating novel haplotypes for breeding new varieties. In a population of MAGIC RILs with high proportion of recombinant haplotypes but low number of unique recombinant haplotypes, the QTL mapping resolution can be poor when the recombinant haplotypes are largely composed of pairs of founders carrying the same causative QTL haplotype. Low variance is beneficial for the same reason as explained in the previous paragraph. The means for the number of unique recombinant haplotypes are 52.20, 49.93, 50.29, 47.81, and 34.51 for designs 1–5, respectively, while the variances are 3.31, 4.29, 4.21, 4.68, and 34.78 for designs 1–5, respectively (Figure 6B). The means are highest in design 1, similar in designs 2 and 3, slightly lower in design 4 and lowest in design 5. The variances follow a similar but reverse trend as the means except for design 5 where the variance is over seven times greater. Equivalently, the coefficients of variation (CVs) are 0.035, 0.041, 0.041, 0.045, and 0.171 for designs 1–5, respectively. That for design 5 is approximately four times that for the other designs.The distributions of individual recombinant haplotype should be consistent across all recombinant haplotypes with minimal variability across simulations in a good design. This metric offers an in-depth view of individual recombinant haplotypes by combining the two previously described metrics. Here, we can identify the individual recombinant haplotypes that deviate from the others, which can be a cause of concern relating to poor QTL mapping resolution and lack of novel recombinant haplotypes for breeding uses. With the exception of design 5, all other designs have similar distributions of individual recombinant haplotype (Figure 6C and Supplementary Table S5). Similar to wheat-DE8 (Figure 2B), cowpea and tomato (Figure 4), design 5 has more two-ways recombinant haplotypes than other recombinant haplotypes. Furthermore, the spreads of two-ways recombinant haplotypes in design 5 are much higher than the others, which imply low consistency.Similar to the previous criterion, the proportions of founder genomes should be consistent across all founders with low variability across simulations in a good design. This is an important metric that highlights the disparity in founder genome distribution. Multiple uses of MAGIC populations are compromised when the disparity is large, for instance, rare QTLs may drop out, GP training model and breeding options become skewed, and valuable diversity is lost in a genetic resource management program. With eight founders, the expected proportion of each founder genome in a population is 0.125. The proportions are within 0.01 of expectation for all designs except for design 5, which has 5 out of 8 proportions exceeding the range (Figure 7A). On the other hand, the variances are lowest in design 1, slightly higher in designs 2–4, and highest in design 5 (Figure 7A).
Figure 7
Distributions of founder genomes in five MAGIC population designs. Founder genomes are evaluated from 100 iterations of simulation. (A) Proportions of each founder genome in the MAGIC RILs. (B) Proportions of the MAGIC RILs carrying tracts of 1–8 unique founder genomes in each chromosome. (C) Mean count of nonrecombinant segment length in each RIL’s chromosome.
Distributions of founder genomes in five MAGIC population designs. Founder genomes are evaluated from 100 iterations of simulation. (A) Proportions of each founder genome in the MAGIC RILs. (B) Proportions of the MAGIC RILs carrying tracts of 1–8 unique founder genomes in each chromosome. (C) Mean count of nonrecombinant segment length in each RIL’s chromosome.In any single chromosome, a RIL can carry tracts of 1–8 unique founder genomes and it is generally better to have more unique founder genomes. This metric is similar to the first metric showing the total recombinant haplotypes where higher number of unique founder genome suggests more recombinations. In addition, this metric highlights the relationship between genetic length and the number of unique founder genomes, which demonstrates the advantages of MAGIC in species with many genetically long chromosomes. In the shortest chromosome (chromosome 1), designs 1–3 frequently produce three unique founders, while designs 4 and 5 frequently produce four unique founders (Figure 7B). In the longest chromosome (chromosome 5), designs 1–4 frequently produce six unique founders while design 5 frequently produces seven unique founders (Figure 7B).Finally, a good design should have short nonrecombinant segments, which can be achieved by increasing the number of crossing generations from the founders to the RILs. Short nonrecombinant segments imply higher QTL mapping resolution and possibly better marker effect prediction in GP. Across all chromosomes, design 5 has the shortest nonrecombinant segments, followed by design 4, and designs 1–3 being undiscernible (Figure 7C).Of all the five designs considered here, each has its own advantages and disadvantages. Designs 1–3 are highly similar except that design 1 tends to show smaller variability than the other two at the cost of more crossing work required. Design 4 is slightly better than the first three in most occasions, although it is slightly more variable and requires one additional generation. Design 5 is generally poor and should be avoided if possible, although the additional crossing generation helps in increasing the number of unique founders and reducing nonrecombinant segment lengths. Of all designs considered, design 4 appears to be the best option if the additional generation is acceptable, otherwise either designs 2 or 3 is a good alternative. Across all the metrics used for comparisons in “magicdesign,” there is no observable difference between designs 2 (balanced) and 3 (unbalanced).
Discussion
Ease of design appears as a major factor in driving the design choices in currently available MAGIC populations. These MAGIC populations are predominantly made of four or eight founders crossed using a basic design (Figure 1B). There are several possible explanations to the choice popularity. First, P2 designs are easier to handle than NP2 designs since the individuals in every funnel are halved at every crossing generation. Besides, four and eight founders are effectively the lowest numbers of founders available in P2 designs, and higher numbers of founders require more generations of crossing and may increase the design complexity. Of all the explored designs, the basic design likely requires the least amount of effort in population construction. The only other option that may rival a basic design is the unstructured design with random mating, which often relies on segregation of male sterility loci. Unfortunately, this system is not always readily available in every species and may restrict the founder choices.Choice of MAGIC population design plays a critical role in determining the recombination landscape in the RILs. In the comparison between wheat-UK8 and wheat-DE8, we identified a bias in individual recombinant haplotypes in the basic design but not the partial design (Figure 2). The bias resulted in more two-way recombinant haplotypes than other recombinant haplotypes. The bias might be exacerbated if the pairs of founders in two-ways are genetically more similar than others, which can happen if the founders stratify into two or more groups. It is possible to avoid pairing the founders of the same groups in two-ways if the grouping is known. For example, Pascual made the two-way crosses by crossing tomato founders with large fruits to founders with small fruits, and Ogawa followed similarly by crossing indica rice founders to japonica rice founders. This countermeasure is only possible if the numbers of founders are equal between groups, but not if the founders cannot be subdivided equally like the barley (Sannemann ), cowpea (Huynh ), and wheat (Stadlmeier ) MAGIC populations. Besides, the stratification may be incomplete due to other traits that are not considered, for example, flowering time and nutrition qualities in tomatoes.In addition to the bias, the basic design also resulted in a lower proportion of unique recombinant haplotypes than the partial design (Figure 3). Because a basic design always has less funnels than any other designs, high replication of cross progeny is required to bring the number of RILs up. In general, replicates reduce the amount of crossing work required in prior generation by keeping more than one progeny from a single cross to advance. The recombination landscape in these replicated individuals is nonindependent because any prior recombinations are passed down from their parents. The detriments from replication can be minimized by replicating in earlier generations as subsequent crosses will reduce the nonindependence among replicates. In a MAGIC population with 8 inbred founders, the earliest meaningful replication would be the four-way individuals. However, replicates prior to the final crosses do increase the amount of downstream crossing work, and so it is important to consider the tradeoffs between available work resources and uniqueness of recombinant haplotypes.High marker density is needed to capture the highly recombined genomes of MAGIC RILs. We used the PRHR as a measure of how well the markers capture recombinant haplotypes. PRHRs in the five analyzed datasets correlate well with the marker density. Even with the high marker density in wheat-UK16, the PRHR is only 0.799 (Table 1), which suggests that one-fifth of the recombinant haplotypes is still missing. Some explanations include the sparser marker density in the D-genomes, uneven marker density along the genomes and segregation distortions of the founder genomes. To generalize the relationship between marker density and recombinant haplotypes further, our simulation results showed that marker distance of 0.80 cM or less is sufficient to recover over three-quarters of the recombinant haplotypes. Despite the results from actual datasets being less optimistic than the results from simulation, the importance of high marker density in MAGIC populations still holds.Given that the advantages and disadvantages of different MAGIC population designs are largely unexplored, the “magicdesign” package serves as an important tool to create and test different designs. Specifically, “magicdesign” provides the opportunity to evaluate the options before committing to years of effort in constructing MAGIC populations. In our examples, an additional selfing generation offers a simple path to improvement (Figures 6 and 7), especially in inbreeding species. When used in combination with speed breeding (Watson ), the additional time due to selfing can be minimized. In addition, “magicdesign” also acts as a bridging tool for researchers who are new to MAGIC populations by providing a starting point to creating a MAGIC population. The opportunity to create and test different designs will encourage innovation in MAGIC population designs rather than relying on previously used designs.While the initial version of “magicdesign” package involves simulations with relatively simple parameters, we intend to expand the package scopes to cover broader biological aspects that are relevant to MAGIC. For example, the recombination landscape is generally perceived to be uneven (Petes 2001) and it will be useful to consider recombination hot and cold spots. Gene density varies along the genome and may need to be accounted for in MAGIC simulation, although noncoding and unannotated regions cannot be discounted too given their biological importance (Jiang 2015). Founder diversity was previously shown to be important in MAGIC population design (Ladejobi ). Currently, this feature is not available in “magicdesign” and will be considered as a priority for subsequent versions. Overall, “magicdesign” is a valuable resource for unifying the process of creating and testing MAGIC population designs, and providing the flexibility for additional features to be included in future updates as the package grows with users’ feedback and research demands.
Data availability
The MAGIC datasets used in this work were sourced from www.niab.com/research/agricultural-crop-research/resources/niab-magic-population-resources (wheat-UK8), doi:10.1186/s12864-018-4915-3 (wheat-DE8), doi:10.1111/tpj.13827 (cowpea), doi:10.1111/pbi.12282 (tomato), and mtweb.cs.ucl.ac.uk/mus/www/MAGICdiverse/(wheat-UK16). Links to source datasets without DOI were last accessed on April 13, 2021 and have been archived at web.archive.org on the same day.The “magicdesign” package and its installation instructions are available for download at github.com/cjyang-sruc/magicdesign. Detailed instructions are available at cjyang-sruc.github.io/magicdesign_vignette. The Shiny app “magicdesignee” can be found at magicdesign.shinyapps.io/magicdesignee/. R scripts used in all analyses can be found at cjyang-sruc.github.io/files/magicdesign_analysis.R.Supplementary materials available online at G3.Click here for additional data file.
Authors: Bao-Lam Huynh; Jeffrey D Ehlers; Bevan Emma Huang; María Muñoz-Amatriaín; Stefano Lonardi; Jansen R P Santos; Arsenio Ndeve; Benoit J Batieno; Ousmane Boukar; Ndiaga Cisse; Issa Drabo; Christian Fatokun; Francis Kusi; Richard Y Agyare; Yi-Ning Guo; Ira Herniter; Sassoum Lo; Steve I Wanamaker; Shizhong Xu; Timothy J Close; Philip A Roberts Journal: Plant J Date: 2018-02-24 Impact factor: 6.417
Authors: Susanna Atwell; Yu S Huang; Bjarni J Vilhjálmsson; Glenda Willems; Matthew Horton; Yan Li; Dazhe Meng; Alexander Platt; Aaron M Tarone; Tina T Hu; Rong Jiang; N Wayan Muliyati; Xu Zhang; Muhammad Ali Amer; Ivan Baxter; Benjamin Brachi; Joanne Chory; Caroline Dean; Marilyne Debieu; Juliette de Meaux; Joseph R Ecker; Nathalie Faure; Joel M Kniskern; Jonathan D G Jones; Todd Michael; Adnane Nemri; Fabrice Roux; David E Salt; Chunlao Tang; Marco Todesco; M Brian Traw; Detlef Weigel; Paul Marjoram; Justin O Borevitz; Joy Bergelson; Magnus Nordborg Journal: Nature Date: 2010-03-24 Impact factor: 49.962
Authors: Shichen Wang; Debbie Wong; Kerrie Forrest; Alexandra Allen; Shiaoman Chao; Bevan E Huang; Marco Maccaferri; Silvio Salvi; Sara G Milner; Luigi Cattivelli; Anna M Mastrangelo; Alex Whan; Stuart Stephen; Gary Barker; Ralf Wieseke; Joerg Plieske; Morten Lillemo; Diane Mather; Rudi Appels; Rudy Dolferus; Gina Brown-Guedira; Abraham Korol; Alina R Akhunova; Catherine Feuillet; Jerome Salse; Michele Morgante; Curtis Pozniak; Ming-Cheng Luo; Jan Dvorak; Matthew Morell; Jorge Dubcovsky; Martin Ganal; Roberto Tuberosa; Cindy Lawley; Ivan Mikoulitch; Colin Cavanagh; Keith J Edwards; Matthew Hayden; Eduard Akhunov Journal: Plant Biotechnol J Date: 2014-03-20 Impact factor: 9.803