| Literature DB >> 27402364 |
Abstract
Crop improvement represents a long-running experiment in artificial selection on a complex trait, namely yield. How such selection relates to natural populations is unclear, but the analysis of domesticated populations could offer insights into the relative role of selection, drift, and recombination in all species facing major shifts in selective regimes. Because of the extreme autogamy exhibited by soybean (Glycine max), many "immortalized" genotypes of elite varieties spanning the last century have been preserved and characterized using ∼50,000 single nucleotide polymorphic (SNP) markers. Also due to autogamy, the history of North American soybean breeding can be roughly divided into pre- and posthybridization eras, allowing for direct interrogation of the role of recombination in improvement and selection. Here, we report on genome-wide characterization of the structure and history of North American soybean populations and the signature of selection in these populations. Supporting previous work, we find that maturity defines population structure. Though the diversity of North American ancestors is comparable to available landraces, prehybridization line selections resulted in a clonal structure that dominated early breeding and explains many of the reductions in diversity found in the initial generations of soybean hybridization. The rate of allele frequency change does not deviate sharply from neutral expectation, yet some regions bare hallmarks of strong selection, suggesting a highly variable range of selection strengths biased toward weak effects. We also discuss the importance of haplotypes as units of analysis when complex traits fall under novel selection regimes.Entities:
Keywords: detecting selection; haplotype frequencies; maturity groups; selection on standing variation
Mesh:
Year: 2016 PMID: 27402364 PMCID: PMC5015928 DOI: 10.1534/g3.116.029215
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Population structure of modern public soybean varieties. Rows in the symmetrical IBS matrix are sorted by maturity group (right panel) and then by date-of-release as indicated in adjacent xy plots. Population assignments and admixture are shown in the far-right panel given a model of three populations (see Material and Methods). Colored boxes within the matrix indicate how the total set of accessions was divided into populations for further analysis. SNP, single nucleotide polymorphism.
Power of assorted statistics in the detection of selection at a 5% false positive rate for selection coefficients 0.02, 0.05, 0.1, and initial favored allele frequencies of 0.2, 0.5, and 0.8
| Initial Frequency of Favored Allele | 0.2 | 0.5 | 0.8 | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Selection Coefficient | 0.02 | 0.05 | 0.1 | 0.02 | 0.05 | 0.1 | 0.02 | 0.05 | 0.1 | Average Power |
| Without sampling: sample frequency equals population frequency | ||||||||||
| Δf | 14.2 | 52.3 | 99.5 | 11.6 | 43.3 | 94.0 | 9.3 | 19.1 | 48.4 | 43.52 |
| Logistic β | 2.6 | 5.5 | 59.4 | 12.8 | 50.5 | 97.9 | 11.5 | 24.9 | 61.7 | 36.31 |
| Fst | 23.6 | 69.4 | 99.6 | 13.1 | 48.6 | 97.0 | 0.7 | 0.0 | 0.0 | 39.11 |
| WFABC | 4.4 | 13.2 | 80.6 | 14.0 | 44.3 | 96.1 | 13.1 | 25.0 | 54.0 | 38.3 |
| With sampling: n = (31, 28, 59, 59, 22) | ||||||||||
| Δf | 9.3 | 38.8 | 95.3 | 10.8 | 31.3 | 80.4 | 8.5 | 13.9 | 26.2 | 34.94 |
| Logistic β | 2.8 | 7.7 | 58.4 | 12.5 | 45.2 | 93.8 | 12.3 | 27.2 | 61.8 | 35.74 |
| Fst | 19.0 | 58.2 | 99.6 | 11.3 | 33.7 | 62.8 | 1.6 | 1.9 | 2.1 | 32.24 |
| WFABC | 0.3 | 0.0 | 0.0 | 9.0 | 24.0 | 58.4 | NA | NA | NA | 15.2 |
Δf, frequency change per generation; Fst, fixation index; WFABC, Wright–Fisher ABC-based approach; NA, not applicable.
Based on MG III-IV sampling depth.
WFABC algorithm failed when favored allele frequencies were 0.8 and sampling was used.
Effective population size (Ne) estimates for populations
| Population | Total Individuals Per Timepoint | Total Markers | Ne | 95% C.I. |
|---|---|---|---|---|
| MG 0-I | 27, 20, 40, 34, 16 | 8123 | 172 | 166–178 |
| MG III-IV | 31, 28, 59, 59, 22 | 6903 | 115 | 112–118 |
| MG V+ | 16, 22, 24, 38, 32 | 6625 | 273 | 260–287 |
C.I., confidence interval.
Number of markers with a major allele frequency between 0.5 and 0.6 in the first timepoint sample.
Counts and relative frequencies of selection modes for sliding window analysis
| MG 0-I | MG III-IV | MG V+ | All | |
|---|---|---|---|---|
| Total windows | 4151 | 4151 | 4151 | 12453 |
| Δ | 0.013 | 0.018 | 0.014 | NA |
| Diversity threshold | 1.9 | 1.9 | 1.3 | NA |
| Haplotype sneak | 216 (53% | 166 (38%) | 38 (13%) | 420 (37%) |
| Hard sweep | 16 (4%) | 15 (4%) | 32 (10%) | 63 (6%) |
| Soft sweep | 173 (43%) | 246 (58%) | 237 (77%) | 656 (58%) |
See Figure 4 for additional clarification of terminology.
Percent of total putatively selected regions.
Figure 4Modes of selection acting within and across each population. Each point represents the results for a window of 50 markers incremented by 10 markers along the chromosome. The diversity of lines released in the 2000s relative to lines released prior to 1970 is plotting on the y-axis for each population. The average of the top three absolute values of Δf for each window is plotted on the x-axis. Categories are color coded based on the thresholds defined in Table 5.
Figure 5Haplotype spectra associated with selection. Top two panels within each subfigure show the frequency of each 50 marker haplotype, in 10 marker increments, along a chromosome for pre-1970s and 2000s samples in a given population. Markers are numbered 1 through the last marker on the chromosome in order of physical position (File S4). Haplotypes are ordered in the top panel, purple to crimson, based on frequency in pre-1970s sample. All additional haplotypes are colored gray. The same haplotype in both samples, pre-1970s and 2000s, will have the same color, excepting gray haplotypes. Note that shared colors left to right along the chromosomes could represent unlinked haplotypes although similar frequencies (and frequency changes) generally suggest linkage. The third panel shows average pairwise difference (π) for 50 marker windows; lines are colored based on indicated samples. In the fourth panel, blue lines indicate the frequency change of the most rapidly changing haplotype within a window. Orange lines indicate the average change of the three most rapidly changing alleles in a window. Haplotype change is based on the first (pre-1970s) and last (2000s) samples, whereas allele change is based on all sampled decade groupings (see Materials and Methods). The last panel depicts the average shared haplotype length (H) around a marker for pre-1970s (purple) and 2000s (yellow) samples. Gray dotted lines throughout indicate the median of values in a panel, while red lines, when present, indicate genome-wide thresholds described in Table 5 and shown in Figure 4. See D for annotation of figure elements. (A) Section of chromosome 11 from population MG 0-I. (B) Section of chromosome 5 from MG III-IV. (C) Section of chromosome 19 from MG V+. (D) Entire chromosome 1 from MG 0-I.
Figure 2Biased population IBS with major ancestors. For each population, the IBS between each individual and the indicated ancestor is given as a boxplot. Pairwise IBS between each ancestor is shown at the top as a heatmap matrix in which black represents an IBS of 1, or perfect identity. The “other” category represents combined results from all other ancestors in Figure S1. Maturity group of the ancestor is indicated in brackets below name. PI548445 is shown independently because it in an outlier for its low IBS relative to all other ancestors. IBS, identity-by-state.
Fractional contribution based on haplotype sharing for major North American ancestors relative to varieties released prior to 1970 within each population
| ID | Name | MG 0-I | MG III-IV | MG V+ |
|---|---|---|---|---|
| PI548362 | Lincoln | 0.23 | 0.29 | 0.02 |
| PI548379 | Mandarin (Ottawa) | 0.26 | 0.03 | 0.01 |
| PI548488 | S-100 | 0.02 | 0.07 | 0.20 |
| PI548485 | Roanoke | 0.03 | 0.04 | 0.19 |
| PI548445 | CNS | 0.00 | 0.04 | 0.18 |
| PI548406 | Richland | 0.12 | 0.14 | 0.02 |
| PI548477 | Ogden | 0.01 | 0.04 | 0.14 |
| PI548391 | Mukden | 0.07 | 0.04 | 0.02 |
| PI548318 | Dunfield | 0.02 | 0.07 | 0.04 |
| PI548461 | Improved Pelican | 0.00 | 0.01 | 0.07 |
| PI548311 | Capital | 0.05 | 0.03 | 0.01 |
| PI548382 | Manitoba Brown | 0.05 | 0.00 | 0.01 |
| PI548360 | Korean | 0.02 | 0.05 | 0.02 |
| PI548325 | Flambeau | 0.04 | 0.01 | 0.00 |
| PI548352 | Jogun | 0.01 | 0.03 | 0.00 |
| PI548402 | Peking | 0.00 | 0.01 | 0.03 |
Only ancestors with values >0.03 are shown. ID, identifier.
Figure 3Relationship between reduced diversity and founding ancestors of pre-1970s varieties within each population. Only chromosome 10 is shown as a representative example; similar plots for all chromosomes are available as supplemental material (File S1). (A) Log2 ratio of mean pairwise difference of a given population (π) relative to all 29 North American ancestors (π) is given for each window of 50 markers along the chromosome. Markers are numbered 1 through the last marker on the chromosome in order of physical position (File S4). (B) The identity of each marker relative to the given ancestor is depicted for each of the pre-1970s lines from populations MG 0-I and MG V+. Each individual is shown as a row and, therefore, appears as many times as there are ancestors. A locus is colored if it is identical to the ancestor. White space indicates a mismatch. Heterozygous markers, though rare, are colored gray. Gray vertical lines spanning the figure indicate regions of reduced diversity. Position of maturity gene, e2, is labeled. The trough at position 1074 is off scale for all populations (see Table 2).
Characterization of regions of reduced diversity in founding ancestors of each population
| Chr. | Pop. | Percent Founders | Midsite Marker | Start (bp) | Stop (bp) | log2(π | π | Major QTL/Gene |
|---|---|---|---|---|---|---|---|---|
| 8 | 0-I | 86 | 2014 | 38,409,071 | 39,587,082 | <−8 | 0.1 | |
| 10 | 0-I | 100 | 1074 | 28,888,713 | 32,173,874 | <−8 | 0.05 | |
| 10 | III-IV | 100 | 1074 | 28,888,713 | 32,173,874 | <−8 | 0.05 | |
| 10 | V+ | 100 | 1074 | 28,888,713 | 32,173,874 | <−8 | 0.05 | |
| 18 | 0-I | 100 | 2274 | 48,936,186 | 49,279,506 | <−8 | 0.18 | |
| 20 | V+ | 100 | 544 | 18,760,809 | 22,082,754 | <−8 | 0.08 | |
| 6 | V+ | 83 | 1034 | 16,372,276 | 16,548,439 | −6.53 | 0.23 | |
| 16 | V+ | 50 | 1214 | 29,198,889 | 29,895,378 | −5.3 | 0.29 | |
| 1 | 0-I | 100 | 504 | 7,476,077 | 10,015,701 | −4.34 | 0.16 | |
| 1 | V+ | 67 | 1484 | 52,033,784 | 52,821,919 | −4.28 | 0.28 | Methionine |
| 19 | 0-I | 86 | 1444 | 36,758,516 | 37,272,085 | −4.24 | 0.22 | |
| 1 | 0-I | 100 | 1234 | 49,196,746 | 49,637,775 | −3.47 | 0.35 | |
| 17 | 0-I | 100 | 324 | 4,161,627 | 4,680,961 | −3.46 | 0.25 | |
| 14 | 0-I | 86 | 1214 | 19,720,771 | 25,312,530 | −3.32 | 0.22 | |
| 20 | V+ | 67 | 684 | 26,836,040 | 29,865,868 | −3.31 | 0.2 | Protein/yield |
| 10 | 0-I | 71 | 1754 | 43,758,245 | 44,436,997 | −3.27 | 0.32 | Maturity- |
| 20 | 0-I | 71 | 714 | 28,639,256 | 31,998,840 | −3.26 | 0.12 | Protein/yield |
| 8 | 0-I | 100 | 2184 | 40,881,278 | 41,453,586 | −3.22 | 0.21 | |
| 12 | V+ | 67 | 334 | 2,681,036 | 3,072,635 | −3.22 | 0.29 | |
| 7 | 0-I | 100 | 1264 | 15,793,416 | 16,436,790 | −3.2 | 0.28 | |
| 11 | III-IV | 75 | 1434 | 33,511,555 | 34,570,537 | −3.04 | 0.14 | |
| 11 | 0-I | 100 | 704 | 10,516,051 | 11,429,744 | −2.87 | 0.15 | |
| 17 | 0-I | 43 | 1044 | 14,159,743 | 15,834,164 | −2.86 | 0.27 | |
| 17 | V+ | 67 | 874 | 13,006,407 | 13,154,755 | −2.82 | 0.4 | |
| 19 | 0-I | 100 | 304 | 3,591,338 | 4,759,347 | −2.8 | 0.12 | |
| 13 | V+ | 67 | 94 | 857,022 | 1,644,249 | −2.78 | 0.31 | |
| 13 | V+ | 67 | 94 | 857,022 | 1,644,249 | −2.78 | 0.31 | |
| 20 | V+ | 83 | 1354 | 42,078,113 | 43,040,384 | −2.74 | 0.25 | Seed weight |
| 4 | III-IV | 33 | 1884 | 48,242,486 | 48,922,546 | −2.63 | 0.29 | |
| 11 | V+ | 67 | 1424 | 33,371,745 | 34,403,523 | −2.57 | 0.14 | |
| 20 | V+ | 50 | 774 | 32,581,226 | 33,137,092 | −2.56 | 0.22 | Maturity- |
| 19 | III-IV | 67 | 2024 | 46,033,555 | 47,088,579 | −2.51 | 0.29 | Maturity- |
| 12 | 0-I | 100 | 534 | 5,163,152 | 6,044,298 | −2.49 | 0.17 | Pubescence form |
| 11 | 0-I | 86 | 1434 | 33,511,555 | 34,570,537 | −2.48 | 0.14 | |
| 11 | V+ | 100 | 634 | 8,633,864 | 9,963,410 | −2.46 | 0.2 | |
| 18 | 0-I | 100 | 2134 | 47,719,925 | 48,017,046 | −2.45 | 0.4 | |
| 14 | III-IV | 100 | 1204 | 19,264,654 | 23,701,369 | −2.39 | 0.22 | |
| 12 | V+ | 67 | 834 | 8,419,651 | 9,023,940 | −2.35 | 0.23 | |
| 8 | V+ | 83 | 154 | 2,331,207 | 2,729,661 | −2.35 | 0.31 | |
| 9 | 0-I | 100 | 424 | 5,002,375 | 5,769,646 | −2.35 | 0.36 | |
| 6 | V+ | 67 | 1214 | 18,916,841 | 21,745,751 | −2.32 | 0.38 | Maturity- |
Chr., chromosome; Pop., population; QTL, quantitative trait locus.
Percent of founding ancestors possessing the major haplotype.
Markers are indexed from 1 to the total markers for that chromosome based on genomic position as depicted in Figure 3.
Based on cloned genes deposited in Soybase and publicly available data from three genome-wide association studies (Vaughn ; Sonah ; Zhou ).
Identified as improvement sweeps in Wen or Zhou .
Tagging markers for haplotype blocks putatively selected in one population but fixed in the opposite direction in another
| Chr. | Marker Index | Position | Ref. | MG 0-I | MG III-IV | MG V+ | |||
|---|---|---|---|---|---|---|---|---|---|
| Δ | Final Freq. | Δ | Final Freq. | Δ | Final Freq. | ||||
| 2 | 39 | 558,323 | C | −9.8 | 0 | 3.8 | 0.64 | 19.08* | 0.95 |
| 2 | 337 | 4,551,551 | C | −14.15* | 0.31 | −5.13 | 0.64 | 11.37 | 0.96 |
| 2 | 1047 | 11,998,550 | C | −13.35* | 0.31 | −0.24 | 0.59 | 5.71 | 0.96 |
| 3 | 59 | 592,600 | T | 0 | 0.96 | −6.04 | 0.68 | −15.51* | 0 |
| 6 | 355 | 7,683,418 | A | −14.61* | 0.06 | 2.97 | 0.64 | 0 | 1 |
| 6 | 1201 | 19,407,046 | A | −0.23 | 0.88 | 32.01* | 0.96 | −5.07 | 0.02 |
| 7 | 595 | 8,112,122 | C | −0.38 | 0.94 | −5.04 | 0.55 | −17.86* | 0 |
| 13 | 1495 | 28,550,563 | A | −13.81* | 0 | −0.68 | 0.93 | 6.85 | 0.81 |
| 13 | 2263 | 36,616,135 | A | −15.76* | 0.06 | −14.44 | 0.32 | 1.22 | 0.95 |
| 15 | 677 | 9,508,185 | G | 3.94 | 1 | −7.03 | 0.55 | −14.7* | 0.25 |
| 15 | 682 | 9,544,360 | T | −14.32* | 0 | 3.95 | 0.82 | 8.93 | 1 |
| 15 | 857 | 11,416,165 | G | 17.53* | 0.97 | −2.75 | 0.59 | −5.97 | 0.05 |
| 17 | 468 | 6,742,263 | C | 0 | 1 | −10.4 | 0.36 | −22.32* | 0 |
| 18 | 2555 | 53,152,286 | C | −4.87 | 0.06 | −4 | 0.59 | 16.84 | 0.95 |
| 19 | 1817 | 42,812,863 | T | 3.74 | 0.5 | 20.49* | 0.96 | −8.44 | 0.04 |
| 20 | 1450 | 44,469,797 | A | 21.16* | 1 | 4.88 | 0.8 | 0 | 0 |
Chr., chromosome; Ref., reference; Δf, frequency change per generation; Freq., frequency.
For cross-reference to figures in File S2.
Δf and final frequencies are relative to the major allele in MG 0-I, pre-1970s sample.
Δf (frequency change per generation) are multiplied by 1000 for ease of presentation and asterisks (*) indicate regions that are beyond population thresholds given in Table 5.
Though part of the same linkage block, two representative markers are given because of high and contrasting rates in both MG 0-I and MG V+.