| Literature DB >> 33794767 |
Peter Civáň1,2, Konstantina Drosou1,3, David Armisen-Gimenez2,4, Wandrille Duchemin2,5, Jérôme Salse2, Terence A Brown6.
Abstract
BACKGROUND: Barley is one of the founder crops of Neolithic agriculture and is among the most-grown cereals today. The only trait that universally differentiates the cultivated and wild subspecies is 'non-brittleness' of the rachis (the stem of the inflorescence), which facilitates harvesting of the crop. Other phenotypic differences appear to result from facultative or regional selective pressures. The population structure resulting from these regional events has been interpreted as evidence for multiple domestications or a mosaic ancestry involving genetic interaction between multiple wild or proto-domesticated lineages. However, each of the three mutations that confer non-brittleness originated in the western Fertile Crescent, arguing against multiregional origins for the crop.Entities:
Keywords: Barley; Exome sequences; Fertile Crescent; Gene flow; Hordeum vulgare; Origins of agriculture; Selection; Selective sweep
Year: 2021 PMID: 33794767 PMCID: PMC8015183 DOI: 10.1186/s12864-021-07511-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Structure and geography of barley populations. a The top two PCs of the nucleotide diversity in wild and cultivated barley. Wild barley accessions are marked as crosses and domesticated accessions as full circles. Several accessions previously described as wild, but collected outside of the primary distribution range, were labelled as feral/hybrid (full triangles) and excluded from further analyses (see Additional file 1: Table S1) . None of the top 20 PCs placed cultivated barley in separate clusters (not shown). b–d PCAs of the cultivated barleys (wild barley excluded). On all three panels, group membership is indicated with the same colours as on the map below. e Geography of the domesticated groups defined by the PCA of cultivated barley
Fig. 2Inference of population splits with various numbers of population mixtures. TreeMix population graphs (left) and the residual matrices (right) are shown for modelling a zero migration (admixture) events, b 1 event, c 2 events, d 3 events, and e 4 events. Admixture is indicated by arrows that are coloured according to the inferred relative genetic contribution. All shown admixture edges improve the fit of the graphs to the data with the highest significance (p < 2.22507 × 10− 308), except the group II → group III migration on the panel d (2.10942 × 10−15), and the group IV → (groupIII, groupVI) migration on the panel e (1.11022 × 10−16). The residual matrices quantify the inter-group covariance of allelic frequencies not captured by the respective graphs, and thus indicate pairs of populations where additional gene flow edges might improve the fit
ABBA-BABA-related statistics
| Four-taxon set | Best tree (according to the BBAA count with fixed outgroup) | D-statistics (excess of ABBA patterns) | fG (genomic fraction shared through gene flow) | Gene flow significance (FWER correction) * |
|---|---|---|---|---|
| One domesticated group | (((wild-W, group IV), wild-E), O) | 0.0198 | 0.0202 | |
| (((wild-E, group VI), wild-W), O) | 0.0132 | 0.0171 | ||
| (((group I, wild-E), wild-W), O) | 0.0118 | 0.0177 | ||
| (((group II, wild-W), wild-E), O) | 0.0133 | 0.016 | ||
| (((group V, wild-E), wild-W), O) | 0.0043 | 0.0069 | ||
| (((group III, wild-W), wild-E), O) | 0.0053 | 0.0068 | ||
| Two domesticated groups | (((group II, group V), wild-E), O) | 0.0586 | 0.0614 | ** |
| (((group III, group V), wild-E), O) | 0.0501 | 0.0543 | ** | |
| (((group I, group IV), wild-W), O) | 0.0374 | 0.0501 | ** | |
| (((group II, group I), wild-E), O) | 0.0552 | 0.0610 | ** | |
| (((group I, group VI), wild-W), O) | 0.0321 | 0.0386 | ** | |
| (((group II, group VI), wild-E), O) | 0.0395 | 0.0404 | ** | |
| (((group III, group I), wild-E), O) | 0.0468 | 0.0545 | * | |
| (((group II, group IV), wild-E), O) | 0.0430 | 0.0438 | * | |
| (((group III, group VI), wild-E), O) | 0.0322 | 0.0321 | * | |
| (((group V, group IV), wild-W), O) | 0.0315 | 0.0414 | * | |
| (((group V, group VI), wild-W), O) | 0.0233 | 0.0288 | * | |
| (((group I, group II), wild-W), O) | 0.0290 | 0.0365 | * | |
| (((group V, group II), wild-W), O) | 0.0214 | 0.0268 | ||
| (((group I, group III), wild-W), O) | 0.0287 | 0.0342 | ||
| (((group V, group III), wild-W), O) | 0.0210 | 0.0242 | ||
| (((group III, group IV), wild-E), O) | 0.0331 | 0.0353 | ||
| (((group VI, group I), wild-E), O) | 0.0202 | 0.0217 | ||
| (((group VI, group V), wild-E), O) | 0.0207 | 0.0219 | ||
| (((group II, group III), wild-E), O) | 0.0119 | 0.0101 | ||
| (((group IV, group I), wild-E), O) | 0.0154 | 0.0175 | ||
| (((group III, group IV), wild-W), O) | 0.0119 | 0.0170 | ||
| (((group VI, group IV), wild-W), O) | 0.0102 | 0.0129 | ||
| (((group IV, group V), wild-E), O) | 0.0166 | 0.0177 | ||
| (((group II, group IV), wild-W), O) | 0.0106 | 0.0141 | ||
| (((group I, group V), wild-W), O) | 0.0095 | 0.0115 | ||
| (((group III, group VI), wild-W), O) | 0.0022 | 0.0029 | ||
| (((group VI, group IV), wild-E), O) | 0.0032 | 0.0033 | ||
| (((group III, group II), wild-W), O) | 0.0013 | 0.0016 | ||
| (((group II, group VI), wild-W), O) | 0.0008 | 0.0011 | ||
| (((group V, group I), wild-E), O) | 0.0003 | 0.0003 |
Abbreviations: wild-E wild superpopulation east of the Euphrates, wild-W wild superpopulation west of the Euphrates
Fig. 3Ancestry-informative SNPs in the exome data. The left panels present the distribution of SNPs on joint allele frequency spectra, and delineate ancestry-informative frequency classes (dashed lines). Observed proportions of variants in each frequency class (O) were logarithmically transformed and expressed by a colour gradient. The right panels show similarity of wild accessions to the selected variant sets, measured as identity-by-state (IBS). a Variants with frequencies > 0.5 in all cultivated groups and their frequency distribution in wild barley (left). The dashed line delineates 5666 ancestry-informative variants and their occurrence in wild accessions is depicted on the map (right). Note that although the allele frequency spectrum shows allele frequencies for the entire cultivated supergroup, we only selected variants that are truly major in each of groups I–VI. b Frequency distribution of major group I variants in the remaining cultivated population (left) and occurrence of the selected ancestry-informative alleles in wild accessions (right). c–g Equivalent description of major variants in groups II–VI, respectively
Fig. 4Population history of cultivated barley. Geographical summary of the population history reconstructed from all collected evidence. The pie charts indicate the proportions within each group of the indehiscence alleles btr1 and btr2 (green; see Additional file 4: Fig. S3 for details) and of 2- and 6-rowed barleys (purple). The natural distribution range of wild barley is approximated with yellow shading. The likely locations of the inferred gene flow between wild and cultivated barley are indicated by dashed lines
Fig. 5Selective sweeps within the barley genomes. The barley genome, comprising chromosomes 1H–7H, is presented in a circular layout [31] with a different concentric track for each of the geographically-defined groups I–VI. Within each track, the values for Tajima’s D and the diversity reduction index [log2(DRI)] are indicated by the red and dark blue plots, respectively, sharing the y-axis with a range [− 3, 9]. The identified sweep regions are highlighted. The degree of similarity to the 6ky barley genome is indicated at the outer edge of each track by the greyscale line (gradation from white for < 83% similarity to black for > 99% identity, calculated for each genomic window as the proportion of major alleles matching the 6ky variants). In the outer track each chromosome is represented as a box, with the centromere indicated by the crossbar and the physical coordinates (Mb) marked. Positions of previously described domestication genes [15, 16] and the genes with protein-changing variants identified in this study (see Table 2) are shown in the outermost track
Fig. 6UpSet-style plot [34] summarizing the swept genomic regions and their intersection sizes. The black/red bars on the left indicate the portion of the genome that was classified as hard sweeps in each of the six cultivated groups (e.g. ~ 15% of the genome [0.15 genome fraction] is swept in group I). The main graph then provides details of the components of these sweeps that are group-specific or shared with one or more other groups. The y-axis indicates the fraction of the reference genome, and the graphics under the x-axis reveal the group(s) with which each fraction is associated. The first six columns show how much of the genome is covered by group-specific hard sweeps. For example, column 1 shows that group V-specific sweeps have an intersection size of 0.066 and hence cover 6.6% of the reference genome. The subsequent columns show the sizes of the aggregated intersections. For example, column 7 shows that the spatial overlap between the sweeps in groups II and III (indicated by black circles) and any other group that intersects this overlap (indicated by the black dots within the grey circles) has an intersection size of 0.141 and so covers 14.1% of the reference genome. The asterisks above the columns indicate those intersection sizes that are significantly larger than a stochastic overlap of independent sweeps, due to shared selection history or parallel targeting. Conversely, absence of significance indicates that intersections of the same or smaller size could occur simply by chance with the given number and length of independent sweeps. Throughout the Figure, red is used to show those fractions with > 99% sequence similarity to the 6ky barley. For example, almost all the sequences simultaneously swept in groups II and III (column 7) have > 99% identity to the corresponding sequences in the 6ky barley. The last column, on the extreme right, shows the size of the regions that are simultaneously swept in all six groups. Although this region represents only 0.005% of the reference genome, it is still significantly larger than would be expected by chance. Its similarity to the 6ky barley is > 99%, indicating early selection of this sequence
Protein-changing variants with contrasting frequencies in wild and cultivated barley
| Location (Morex V1) | Gene (Morex V1) | Gene (Morex V2) | Variant type | Variant in wild barley (frequency) | Variant in domesticated barley (frequency) | Variant in the 6ky barley | Groups where the position falls under a hard sweep | Significant BLASTP hit (species; query coverage; percent identity) |
|---|---|---|---|---|---|---|---|---|
| chr1H: 106920539 | HORVU1 Hr1G024040 | – | NS SNP | Ala (0.902) | Val (0.987) | Domesticated | I, II, III, V, VI, supersample | – |
| chr1H: 165877256 | HORVU1 Hr1G029720 | – | Indel | Wild-type transcripts .18 and .19 have premature stop codon (0.902) | transcripts .18 and 19 are 10 codons and 35 codons longer (0.981) | Missing data | I, III, V, VI, supersample | – |
| chr1H: 527209977 | HORVU1 Hr1G081150 | – | NS SNP | Ala (0.938) | Thr (0.986) | Wild-type | I, II, IV, supersample | – |
| chr1H: 545254976 | HORVU1 Hr1G090080 | HORVU.MOREX.r2.1 HG0074340 | NS SNP | Arg (0.955) | Trp (0.995) | Domesticated | II, III, IV, V, VI, supersample | SPA2 (supressor of phyA-105) ( |
| chr3H: 135052223 | HORVU3 Hr1G029370 | HORVU.MOREX.r2.3 HG0204760 | NS SNP | Glu (0.951) | Asp (0.987) | Domesticated | IV | MRH1/MDIS2 ( |
| chr5H: 592511492 | HORVU5 Hr1G093710 | – | NS SNP | Pro (0.906) | Leu (0.992) | Domesticated | I, V, VI | – |
| chr5H: 593178296 | HORVU5 Hr1G093850 | HORVU.MOREX.r2.5 HG0422940 | NS SNP | Leu (0.911) | Val (0.982) | Domesticated | I, V, VI | Phototrophic-responsive NPH3 family protein ( |
| chr7H: 540527349 | HORVU7 Hr1G089090 | – | NS SNP | Val (0.928) | Ala (0.995) | Domesticated | II, III | – |
| chr7H: 552413998 | HORVU7 Hr1G090560 | HORVU.MOREX.r2.7 HG0597530 | Indel | wild-type C-terminus (0.996) | 3-codon difference at the C-terminus (0.979) | Domesticated | II, III, V, VI | Cinnamyl alcohol dehydrogenase ( |
| chr7H: 552532125 | HORVU7 Hr1G090580 | HORVU.MOREX.r2.7 HG0597570 | NS SNP | Leu (0.902) | Ile (0.990) | Domesticated | II, III, V, VI | – |
Morex V1 refers to the barley genome assembly of Mascher et al. [19]; Morex V2 is the subsequent assembly of Monat et al. [32]
Abbreviations: NS SNP non-synonymous SNP. For domesticated barley variants, the frequency is > 0.9 in each of the six groups. Only BLASTP hits with > 20% query coverage and > 40% sequence identity against characterized proteins are reported