| Literature DB >> 35486687 |
Armin C Hölker1,2, Manfred Mayer1, Thomas Presterl2, Eva Bauer3, Milena Ouzunova2, Albrecht E Melchinger1,4, Chris-Carolin Schön1.
Abstract
Discovery and enrichment of favorable alleles in landraces are key to making them accessible for crop improvement. Here, we present two fundamentally different concepts for genome-based selection in landrace-derived maize populations, one based on doubled-haploid (DH) lines derived directly from individual landrace plants and the other based on crossing landrace plants to a capture line. For both types of populations, we show theoretically how allele frequencies of the ancestral landrace and the capture line translate into expectations for molecular and genetic variances. We show that the DH approach has clear advantages over gamete capture with generally higher prediction accuracies and no risk of masking valuable variation of the landrace. Prediction accuracies as high as 0.58 for dry matter yield in the DH population indicate high potential of genome-based selection. Based on a comparison among traits, we show that the genetic makeup of the capture line has great influence on the success of genome-based selection and that confounding effects between the alleles of the landrace and the capture line are best controlled for traits for which the capture line does not outperform the ancestral population per se or in testcrosses. Our results will guide the optimization of genome-enabled prebreeding schemes.Entities:
Keywords: doubled haploids; gamete capture; genomic selection; landraces
Mesh:
Year: 2022 PMID: 35486687 PMCID: PMC9170147 DOI: 10.1073/pnas.2121797119
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 12.779
Fig. 1.Scheme of population development for the pure and admixed approaches.
Fig. 2.Venn diagram of the number and percentage of marker polymorphisms shared by and exclusive to the sample of the ancestral landrace (LS), DH lines, and GC lines of KE (A). Means and estimated densities of genetic distances (GD) between genotypes within LS, DH, and GC using all markers (B) and between GC lines and FV2 using only markers for which DH and LS were monomorphic for the allele not carried by FV2 (C). Estimated density of the frequency of the FV2 allele in LS and GC (D). Allele frequencies in DH vs. LS (E) and expected frequencies in GC (calculated from LS and known FV2 genotype) vs. observed GC (F). The calculated numbers of marker polymorphisms (A) are the result of sampling 80 gametes per population with 500 replications and are shown as the absolute number and percentage of polymorphic markers (± SD). In GC, the number of polymorphic markers resulting from the cross with FV2 (LS and DH monomorphic for the allele not carried by FV2) is shown as the average across 500 sampling replications. The tables in B and C show the means of the genetic distances and their expected values (calculated from LS allele frequencies). B–F are based on the whole set of lines (i.e., N = 48 [LS], N = 471 [DH], and N = 274 [GC]).
Fig. 3.Decay of LD with physical distance for the sample of the ancestral landrace (LS), the DH lines, and the GC lines of landraces KE and PE (A). Linkage phase similarities (LPS) for pairwise comparisons of the three types of populations within each landrace (B) and LPS for pairwise comparisons of the same type of population across the two landraces (C). For all calculations, 94 gametes were randomly sampled for each group.
Quantitative-genetic expectations of means and genetic variances for per se (PP) and testcross (TP) performance in the sample of the ancestral landrace (LS), derived DH, and GC lines
| Population | Coefficient of parameters | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Population mean | Genetic variances | |||||||||
| Primary variance | Variance within families | Total variance | ||||||||
|
|
|
|
|
|
|
|
| |||
| LS | 0 | 1 | 1 | 0 | 1 | 0 | — | — | 1 | 0 |
| DH | 0 | 1 | 0 | 0 | 2 | 0 | 0 | 0 | 2 | 0 |
| GC-S1:2 | 1/2 | 1/2 | 0 | 1/4 | 3/4 | 1/4 | 1/8 | 1/8 | 7/8 | 3/8 |
| GC-S1:∞ | 1/2 | 1/2 | 0 | 0 | 3/4 | 1/4 | 1/4 | 1/4 | 1 | 1/2 |
| FV2 | 1 | 0 | 0 | 0 | — | — | — | — | — | — |
For GC lines, the total genetic variance is decomposed into the primary variance between families as observed for GC-S1:2 lines in this study and the variance within families.
†Parameters and are not required for TP.
‡ and refer to the frequencies of alleles and in LS, respectively. and refer to the additive effects in LS and the capture line, respectively, with different meanings for PP and TP. and refer to the contribution of dominance effects to the PP of LS and GC-S1:2, respectively, where , , and refer to the dominance effect of genotypes , , and , respectively, with being the allele of the capture line. refers to the additive variance inherent in the ancestral landrace, with . refers to the additive variance resulting from the effects of the capture line alleles, with (details are in ).
Fig. 4.Estimated densities showing the distribution of phenotypic values for per se performance (PP; A) and testcross performance (TP; B) of the DH and GC lines for landrace KE, scatterplots of proportions of FV2 genome vs. TP for flowering time (C), and estimated genetic values of PP vs. estimated genetic values of TP for flowering time in DH (D) and GC (E) lines. In A and B, the means (vertical lines) of the landrace sample (LS, dark green) and the capture line FV2 (yellow) are indicated, and the tables show the means (), genetic variances (), and heritabilities (h2). Means with a shared letter are not significantly different (P > 0.05). C–E indicate the Pearson correlation coefficients and corresponding P values of the shown correlations.
Fig. 5.Prediction accuracy (ρ) in landrace KE for per se performance (PP) in the DH and GC lines as a function of sample size N (A), for prediction of PP and testcross performance (TP) at the maximum available number of lines (Nmax; B), for predictions within and across populations for PP (C) and TP (D) in DH and GC, and for across-landrace prediction for PP from KE (training on PE; E). Traits are plant height at V6 stage (PH_V6), final plant height (PH_final), and flowering time (FF) in PP and TP and dry matter content (DMC) and total dry matter yield (TDMY) in TP. For each N (A), sampling of lines was repeated 100 times, and 10 times fivefold cross-validation was carried out within each sample, yielding the basis for calculating the presented means and 95% quantiles (shaded areas around the curve). Prediction across and within populations as well as across landraces was carried out by randomly sampling N = 200 and N = 75 lines for training in PP (C and E) and TP (D), respectively, for predicting N = 50 (PP; C and E) or N = 25 (D) genotypes of the same or corresponding population (C and D) or the same population of the other landrace (E). Sampling was repeated 100 times. The violin plots (C–E) show all 100 values, with the diamonds indicating the means. Black dots show values of the prediction accuracy estimated from models where the genomic variance estimate was not significant (likelihood-ratio-test, P > 0.05).