Literature DB >> 34849796

Genome-wide association study for morphological, phenological, quality, and yield traits in einkorn (Triticum monococcum L. subsp. monococcum).

Andrea Volante1, Delfina Barabaschi2, Rosanna Marino3, Andrea Brandolini3.   

Abstract

Einkorn (Triticum monococcum L. subsp. monococcum, 2n = 2× = 14, AmAm) is a diploid wheat whose cultivation was widespread in the Mediterranean and European area till the Bronze Age, before it was replaced by the more productive durum and bread wheats. Although scarcely cultivated nowadays, it has gained renewed interest due to its relevant nutritional properties and as source of genetic diversity for crop breeding. However, the molecular basis of many traits of interest in einkorn remain still unknown. A panel of 160 einkorn landraces, from different parts of the distribution area, was characterized for several phenotypic traits related to morphology, phenology, quality, and yield for 4 years in two locations. An approach based on co-linearity with the A genome of bread wheat, supported also by that with Triticum urartu genome, was exploited to perform association mapping, even without an einkorn anchored genome. The association mapping approach uncovered numerous marker-trait associations; for 37 of these, a physical position was inferred by homology with the bread wheat genome. Moreover, numerous associated regions were also assigned to the available T. monococcum contigs. Among the intervals detected in this work, three overlapped with regions previously described as involved in the same trait, while four other regions were localized in proximity of loci previously described and presumably refer to the same gene/QTL. The remaining associated regions identified in this work could represent a novel and useful starting point for breeding approaches to improve the investigated traits in this neglected species.
© The Author(s) 2021. Published by Oxford University Press on behalf of Genetics Society of America.

Entities:  

Keywords:  einkorn; genome-wide association study (GWAS); marker-trait associations (MTAs); wheat

Mesh:

Year:  2021        PMID: 34849796      PMCID: PMC8527505          DOI: 10.1093/g3journal/jkab281

Source DB:  PubMed          Journal:  G3 (Bethesda)        ISSN: 2160-1836            Impact factor:   3.154


Introduction

Plant breeding has significantly improved the production of many crops but has often exerted a bottleneck effect on the genetic diversity of several characters (Bedő and Láng 2015; Louwaars 2018). Modern germplasm, such as wheat, is highly unbalanced compared to the ancestral one found in landraces due to recent selection and spread. The characterization of these genetic resources is thus strategic to better exploit them efficiently into pre-breeding programs (Balfourier ). The exploitation of ancient landraces or wild species requires in-depth phenotypic and genetic characterization to detect superior phenotypes and to identify their underlying genes and/or linked genetic markers (Liu ; Zhou ; Garcia ; Rahimi ; Kumar ; Sansaloni ). Germplasm banks currently host and preserve extensive collections of cultivars, landraces, and wild relatives of most crops, which are an excellent source of genetic variability for allele mining of important traits (Leung ; Sehgal ; Brozynska ; Zhang ; Engels and Thormann 2020). The study of this largely untapped biodiversity would provide both the opportunity to recover new allelic variation, urgently required by breeding programs, and to work on the improvement of key traits directly in ancient materials. Einkorn (Triticum monococcum L. ssp. monococcum) is a hulled diploid wheat (2n = 2× = 14) which was domesticated during the Neolithic Age in the Karacadağ mountain range (Turkey) from its wild ancestor T. monococcum L. ssp. boeoticum (Heun , 2008; Haldorsen ). Einkorn quickly spread by different routes of migration throughout Europe (Brandolini ), where it was widely used as human food for millennia before being gradually replaced, from the Bronze Age, by the polyploid wheats, characterized by higher production and free-threshing grains. Nowadays einkorn is cultivated in marginal regions and only recently, breeding efforts were started in several European countries (Hidalgo and Brandolini 2019). Increased awareness on the nutritional properties of foods and recent trends toward low-impact and sustainable agriculture have led to a renewed interest for this ancient wheat. Einkorn composition and nutritive values are different, often better, from those of bread and durum wheat. For example, einkorn kernels are rich in proteins, lipids (mostly unsaturated fatty acids), fructan and trace elements, including Zn and Fe. Additionally, T. monococcum grains have good concentration of several antioxidant compounds (carotenoids, tocols, conjugated polyphenols, alkylresorcinols, and phytosterols) that, along with low beta-amylase and lipoxygenase activities (which limit antioxidants degradation during food processing), confer to the flour excellent nutritional benefits, superior to those of all the other wheats (Hidalgo and Brandolini 2019). Although not suitable for coeliacs (Zanini ,b), einkorn elicits weaker toxic reactions than other Triticum species (Mølberg ; Vincentini ; Di Stasio ) and thus might be tolerated by patients with some types of food sensitivities (Lombardo ). The T. monococcum genome shows a high gene conservation and co-linearity with durum and bread wheat, even though the polyploid species inherited their A genome from Triticum urartu (Dubcovsky ; Bai ; Brandolini ). Some einkorn genetic maps are available in literature (Dubcovsky ; Kojima ; Taenzler ; Singh ; Jing ; Yu ) and a high-resolution map based on Diversity Arrays Technology and sequencing (DArTseq) markers was recently developed (Marino ). Nevertheless, only few data exist about the molecular basis of traits of interest in einkorn. The soft glumes (Sog, on 7L) trait, the loci coding for seed storage proteins, and Quantitative Trait Loci (QTL) for SDS sedimentation volume and protein content have been located by Taenzler . Alvarez mapped the earliness-related locus Eps-Am 1 in einkorn genome (on 1L) and identified EARLY FLOWERING 3 (ELF3) as the underlying candidate gene, while Chen studied and mapped two stem rust resistance genes (SrTm5 on 7L and Sr60 on 5S). Very recently, Yu analyzed grain size-related traits and identified the involved loci, by linkage and homology mapping and transcriptome profiling. The new approach to genetic resources based on second generation sequencing techniques (genotype by sequencing, DArTseq) has paved the way to a sudden improvement of knowledge about the genomic regions involved in the control of traits of interest by genome-wide association studies (GWASs). However, the full exploitation of these genetic tools in einkorn has long been limited by the poor characterization of the available germplasm. Nonetheless, Jing using association mapping characterized the phenotypic variation and identified several loci involved in the control of traits related to stress adaptation, phenology, and morphology of this species. In this study, a panel of 160 T. monococcum ssp. monococcum landraces was subjected to association mapping analysis for complex agronomic traits related to phenology, morphology, yield, and quality. These accessions were genetically characterized with a DArTseq approach, which yielded two types of markers: (i) silicoDArT and (ii) SNP markers. Numerous marker-trait associations (MTAs), related to the traits analyzed, were identified; three of these overlapped with previously described loci, while four were located in proximity. The remaining associations represent mainly novel loci involved in these traits.

Materials and methods

Plant material, phenotyping, and statistical analysis

The panel used in this study consisted of 160 T. monococcum ssp. monococcum landraces, belonging to a core collection maintained at CREA in Sant’Angelo Lodigiano (Lodi, Italy). The set included material coming from different sites across Europe and neighboring areas (Morocco, Turkey, Near East and Caucasus). The accessions tested, selected mainly on passport data and previous research (Brandolini ; Volante et al. 2020), represent a well-balanced sampling of current T. monococcum distribution and diversity (Heun ; Empilli ). Several accessions for each geographic location were included, depending on availability, as well as selections with good bread making quality. This latter was assessed by micro tests (SDS sedimentation test; Preston ), Brabender farinograph and bread baking according to method 10–10.3 (AACC International 2012a). For further description and details, see Supplementary Table S1, Brandolini and Volante et al. (2020). The field trials were performed in two locations (Sant’Angelo Lodigiano: 45°14′21″N 9°24′22″E, and Lodi: 45°18′12″N 9°30′45″E) for 4 years (from 2010–2011 to 2013–2014). About 50 plants per accession were planted in one single 2 m row for each environment. Standard cultural practices were applied, including nitrogen fertilization (40 kg/ha) at tillering and chemical weed control (Ariane II: Fluroxipir + Clopiralid + MCPA). The spikes were manually harvested at full maturity and seeds stored at 5°C. Whole meal flour was prepared from 10 g de-hulled kernels using a Cyclotec 1093 laboratory mill (FOSS Tecator, Denmark) and stored at −20°C until analysis. The landraces were characterized for thirteen morpho-agronomic descriptors: heading date (HDA, number of days, starting from May 1st, when about 50% of the plants shows emerging spikes), plant height (PLH, excluding awns, in cm), awn length (AWN, from 1 = no awns to 5 = very long awns, i.e., longer than spikes), glume color (GLC, 1 = white-cream, 2 = brown, 3 = black, 4 = two colors), and glume hairiness (GLH, 1 = no hair, 2 = short, sparse, 3 = short, thick, 4 = long, thick), rachis brittleness (RAB, 1= very brittle to 5 = non-brittle), spike length excluding awns (SPL, in mm), n° spikelets/spike (SPS), n° kernels/spikelet (GPS), grain length (GRL, in mm), grain width (GRW, in mm), grain thickness (GRT, in mm), and thousand grains weight (TGW, computed from two 100-kernel samples). The data were collected from 10 plants, 10 spikes, or 20 kernels for each landrace. Additionally, three quality traits were assessed: protein content (PRO, N × 5.7, dry matter basis) determined on whole meal according to method 46-10.01 (AACC International 2012b), using a NIR System Model 6500 (FOSS NIRSystems, Laurel, MD, USA), sodium dodecyl sulphate sedimentation volume (SDS) recorded according to Preston , and carotenoid content (CAR) assessed following method 14-60.01 (AACC International 2012c) using a DU-62 spectrophotometer (Beckman Coulter Inc., Brea, CA, USA). The qualitative descriptors were measured from at least two technical replicates. An analysis of variance (ANOVA) was conducted throughout the set of phenotypic data, using genotypes and years as factors; the locations were used as repetitions. Broad sense heritability (H2) was calculated according to Nyquist (1991): where σ2G is the genetic variance, σ2GE is the genotype × environment interaction variance, σ2e is the residual variance, E is the number of environments, and r the number of replicates. All analyses were performed using the STATGRAPHICS® Centurion XV (Statgraphics Technologies, Inc., The Plains, VA, USA) statistical program.

Genotypic data and in silico mapping

The genotypic data for the 160 accessions consisted of a set of 33,260 DArTseq markers, described in Brandolini , including both SNP and SilicoDArTs. DNA extraction from 7-day-old seedlings (one plant per accession) was performed with a standard CTAB protocol as described by Stein . DNA concentration was quantified with a NanoDrop ND-1000 spectrophotometer (Thermo Fisher Scientific Inc., Waltham, MA, USA). The DNA was analyzed with the DArT-seq technology (Kilian ) by the company Diversity Arrays Technology (Canberra, Australia). Twelve genotypes (i.e., the very same DNA) were fingerprinted twice (unknown to the lab in Australia). The original panel was filtered for callrate and minor allele frequency (MAF) using the PLINK software (Purcell ). Markers with a callrate lower than 90% and with MAF lower than 5% were discarded. An einkorn physical map was produced in silico using the most exhaustive and comprehensive genome available for wheats so far, that of Triticum aestivum. A physical position of DArTseq markers on bread wheat genome was determined by aligning the corresponding nucleotide sequences (https://www.diversityarrays.com/technology-and-resources/sequences/), supplied by Diversity Array Technologies, to the A genome of the RefSeq v1.0 reference genome (IWGSC 2018). The einkorn genetic map of Marino was used to orient the physical map developed in this work. The physical positions of 184 markers in common between these two maps were exploited as a scaffold (Supplementary Table S2). A link to the T. monococcum contigs and to T. urartu pseudomolecules was also acquired blasting the same marker reads against TGAC WGS monococcum v1 (available at https://wheat-urgi.versailles.inra.fr) and against GCA_003073215.1 assembly (Ling ), respectively. BLASTN searches were performed using the tool available at the URGI (https://wheat-urgi.versailles.inra.fr/Seq-Repository/BLAST) for T. aestivum and monococcum genome; while for T. urartu BLAST was conducted locally on downloaded genome. All the analyses were done using BLASTN with these cutoffs: a minimum coverage of 99% (only one gap is admitted) and an identity over or equal to 97% (no more than two mismatches).

Analysis of population structure

The analysis of genetic stratification in the einkorn panel was performed with two different methods: a principal component analysis (PCA) by the Tassel v5.2.0 software (Bradbury ) and a phylogenetic clustering obtained utilizing the MEGA7 software (Kumar ) with the neighbor-joining method and default parameters. A Bayesian model-based analysis was achieved using the program Structure v2.3.4 (Pritchard ). The parameters used were the following: presence of admixture, allele frequencies correlated, a burn-in period of 10,000 iterations, followed by 20,000 Monte Carlo Markov Chain (MCMC) replications, K levels from 1 to 10 and 5 runs per K-value. The free software Structure Harvester (Earl and von Holdt 2012) was used to determine the best number of clusters (K) according to the Evanno method. Once defined the most probable K-value, a final single run was performed using the same parameters listed above, but with a higher number of burn-in and MCMC iterations (100,000 and 200,000, respectively). Accessions with a 0.7 minimum membership (i.e., the probability of one individual to belong to a subgroup identified by Structure) were assigned to a subpopulation, while the remaining were considered as admixed. The number and composition of the clusters identified were integrated into the results of the neighbor-joining analysis and the PCA for cross-validation.

Analysis of linkage disequilibrium decay

The computation of pairwise linkage disequilibrium (LD; r2) was conducted as in Volante et al. (2017). Briefly, the whole set of markers was analyzed with the R package “LDcorSV v1.3.1” (Mangin ), using as covariates the Structure membership matrix and the kinship matrix calculated by Tassel. The values were binned in 10 kb windows as in Biscarini . For each of the nine linkage groups (i.e., all the chromosomes including the split 2 and 4), the median value of each 10-Kb interval was calculated (this was used instead of the mean to account for a non-normal distribution of LD values at short and long-distance classes). The resulting values were plotted against physical distance and fitted to a second-degree LOESS curve using an R script (Cleveland 1979; Marroni ). A critical value of 0.2 was set as r2 between unlinked loci. The physical distance corresponding to a LOESS curve value of 0.2 was assumed as LD decay in the einkorn panel.

Association analysis

For the genome-wide association analysis, to combine data from multiple environments, the Best Linear Unbiased Predictors (BLUPs) of each trait for each accession were calculated using the lmer function of the lme4 R package (Bates ). The model fitted to the data considered genotype, year, and genotype: year interaction as random factors. The ranef function was used to calculated BLUPs, which were then used for the association analysis. To identify the best covariates for each trait, we used the method described by Courtois , i.e., a preliminary GWAS was performed with Tassel, testing a Mixed Linear Model (MLM) corrected with (1) the kinship matrix and (2) the kinship matrix and 5 components of the PCA analysis. In all cases, the Tassel default options (“optimal compression” and P3D mode) were used. The best model was identified for each trait by calculating the Bayesian Information Criterion (BIC) as in Schwarz (1978); the BIC was computed as -2 ln(L) + kln(n) where L is the maximized value of the likelihood function for the estimated model (provided by the GWAS analysis), k is the number of estimated parameters and n is the size of the sample. The model with the lowest BIC value was selected. The final GWAS was then performed with the most appropriate model for each trait, with Tassel using the “no compression” and “no P3D” mode to increase the power of detection (Courtois ). A P-value of the association to the phenotypic traits was computed for each marker; the significance threshold to declare a marker as associated was set to 0.05 after correction for multiple testing, using the false discovery rate method (Benjamini and Hochberg 1995). Manhattan plots and Q–Q plots of each trait were drawn using the R package “qqman” (Turner 2014).

Results

Statistical analysis of phenotypic results

The ANOVA conducted using year and genotype as factors indicated that both had significant effects on the phenotypic variation (Supplementary File S1), highlighting the strong effect of changing yearly climatic conditions and the broad variation available in the einkorn pool tested. This last aspect is evident in the data from Table 1, reporting the average, minimum and maximum values of the traits analyzed, expressed as average of four growing seasons (2011–2012, 2012–2013, 2013–2014, and 2014–2015) and two locations (SAL and LO), as well as their broad-sense heritabilities. These latter had remarkably high values, ranging between 0.896 (GLC) and 0.981 (HDA).
Table 1

 Mean, minimum, maximum values, and broad-sense heritabilities (H2) for 16 variables measured on the 160 einkorn accessions

TraitMeanMinimumMaximumH2
HDA28.1710.2538.500.981
PLH108.8947.66126.880.931
AWN3.901.884.500.947
GLC1.351.003.000.896
GLH1.111.002.500.948
GPS1.080.851.440.902
GRL7.396.618.240.953
GRT1.931.372.220.915
GRW2.882.413.310.922
RAB3.713.134.380.923
SPL7.203.709.700.933
SPS28.4323.2434.840.943
TGW27.2016.0535.660.952
CAR8.184.9211.460.967
PRO17.1114.3620.700.924
SDS19.1911.6356.610.959

AWN, awns length (from 1 = no awns to 5 = very long awns, i.e., longer than spikes); CAR, carotenoids content (mg/kg dry weight); GLC, glume color (1 = white-cream, 2 = brown, 3 = black, 4 = two colors); GLH, glume hairiness (1 = no hair, 2 = short, sparse, 3 = short, thick, 4 = long, thick); GPS, N° kernels/spikelet; GRL, Kernel length (mm); GRT, Kernel thickness (mm); GRW, Kernel width (mm); HDA, heading (number of days); PLH, plant height (cm); PRO, proteins content (% dry weight); RAB, rachis brittleness (from 1 = very brittle to 5 = non-brittle); SDS, SDS sedimentation (mL); SPL, spike length (mm); SPS, N° spikelets/spike; TGW, thousand kernel weight (g).

Mean, minimum, maximum values, and broad-sense heritabilities (H2) for 16 variables measured on the 160 einkorn accessions AWN, awns length (from 1 = no awns to 5 = very long awns, i.e., longer than spikes); CAR, carotenoids content (mg/kg dry weight); GLC, glume color (1 = white-cream, 2 = brown, 3 = black, 4 = two colors); GLH, glume hairiness (1 = no hair, 2 = short, sparse, 3 = short, thick, 4 = long, thick); GPS, N° kernels/spikelet; GRL, Kernel length (mm); GRT, Kernel thickness (mm); GRW, Kernel width (mm); HDA, heading (number of days); PLH, plant height (cm); PRO, proteins content (% dry weight); RAB, rachis brittleness (from 1 = very brittle to 5 = non-brittle); SDS, SDS sedimentation (mL); SPL, spike length (mm); SPS, N° spikelets/spike; TGW, thousand kernel weight (g).

Using wheat genomes to define marker location

After filtering for missing data and MAF that eliminated 61.7% of the data, 12,734 out of 33,260 einkorn DArTs were retained for subsequent analyses (Supplementary File S2). An einkorn physical map was developed in silico assigning to 2037 markers a physical position on the bread wheat reference genome (IWGSC RefSeq v1.0). The genetic map recently published (Marino ) was employed as a scaffold to correctly build and orient the einkorn physical map. Two portions, one for two and one for four chromosomes, showed a marker order conserved but inverted respect to bread wheat genome. Therefore, these chromosomes were split in two portions each (“up” and “down”): the distal portion of chromosome 2 was inverted respect to the physical map of T. aestivum (chr 2_up: from 500,610 to 756,929,079 bp and chr 2_down from 780,777,947 to 756,949,663 bp), while in the case of chromosome 4, the two portions were swapped (chr 4_up from 610,263,195 to 738,781,733 bp and chr 4_down from 2,411,431 to 608,561,225 bp of bread wheat respectively; Supplementary Table S2). The number of DArT markers was highest in chromosome 2 (2_up plus 2_down) (386), while chromosome 4 (4_up plus 4_down) had the lowest number (133) (Supplementary Table S2). Mapped DArTs included 16% of the dataset. Sequences of DArT markers were also aligned against the recently released version of T. urartu genome (Ling ): 1673 markers, corresponding to 13% of the whole dataset, were linked to a physical position using the same BLAST criteria adopted with bread wheat genome, with a number of markers/chromosome ranging between 107 (Chromosome 4) and 342 (Chromosome 2; Supplementary Table S4_a). Moreover, following the same pipeline, a link to 3290 einkorn contigs (TGAC WGS monococcum v1) was acquired for 3655 markers (Supplementary Table S4_b).

Population structure

The results of the Principal Component, Structure and Neighbor Joining analyses are shown in Figure 1 and Supplementary Table S1. The most probable Structure model assigned 118 accessions (corresponding to 73.3% of the panel) to 3 clusters (Supplementary Table S1): K1, containing accessions of mixed origin from the Maghreb to the Balcans (red cluster), K2 which mainly included continental accessions (green cluster), and K3 composed mainly of landraces from Turkey, Greece, and Italy (blue cluster). The first three PCs accounted for 27.6% of the variability in the panel (Figure 1A). The clustering coming from PCA was also consistent with that resulted from the neighbor joining tree (Figure 1B); by crossing the results from the three analyses it was evident that the two groups separated by the PC3 in the K1 cluster (red cluster, Figure 1A), corresponded to the accessions from Maghreb and Spain, and those from East-European respectively (Figure 1A). This clustering was consistent with that previously proposed by Brandolini , which in this panel of landraces observed two groups splitting from the main cluster, one of Iberia-Maghreb and one of “prealpine” accessions (mainly from Austria, Germany, and Switzerland).
Figure 1

Analysis of population structure. (A) Principal component analysis (PCA) (the clusters in the legend are those defined by the STRUCTURE analysis); (B) neighbor joining tree. Red = mixed accessions originated from the Maghreb to the Balkans; green = mainly prealpine accessions; blue = mainly landraces from Turkey, Greece, and Italy. The blue-shadowed circles on each branch of the neighbor joining tree show the results of the bootstrap analysis with 1000 iterations, when higher than 0.7. The information about geographic provenance are also reported and color-coded (outer circle).

Analysis of population structure. (A) Principal component analysis (PCA) (the clusters in the legend are those defined by the STRUCTURE analysis); (B) neighbor joining tree. Red = mixed accessions originated from the Maghreb to the Balkans; green = mainly prealpine accessions; blue = mainly landraces from Turkey, Greece, and Italy. The blue-shadowed circles on each branch of the neighbor joining tree show the results of the bootstrap analysis with 1000 iterations, when higher than 0.7. The information about geographic provenance are also reported and color-coded (outer circle).

Analysis of genome-wide linkage disequilibrium decay

The mean chromosome-wise decay of linkage disequilibrium, calculated as r2, was approximately 176 Kbp (Table 2). A heterogeneous distribution was observed over the different chromosomes, ranging from 25 kbp (chr. 6) to 395 Kbp (chr. 5). The average decay plot across all the chromosomes is shown in Figure 2.
Table 2

 Analysis of chromosome-wise LD decay

Chromosome-wise LD decay
Chr.Critical decay distancea (bp)
1245,000
2_up55,000
2_down195,000
3125,000
4_up325,000
4_down135,000
5395,000
625,000
785,000
mean176,111

Corresponding to r2 values of 0.2.

Figure 2

Genome-wide LD decay in einkorn accessions. The red curve represents the second grade LOESS that approximates the r2 values; the horizontal black line shows the threshold r2 value for unlinked markers.

Genome-wide LD decay in einkorn accessions. The red curve represents the second grade LOESS that approximates the r2 values; the horizontal black line shows the threshold r2 value for unlinked markers. Analysis of chromosome-wise LD decay Corresponding to r2 values of 0.2.

Association analyses

In order to account for the interaction genotype × environment on the variability of the traits, association analyses were conducted on BLUP predictors calculated for each accession. The Manhattan and QQplot for each analysis are shown in Figure 3 and Supplementary File S3.
Figure 3

Manhattan plots showing the mapped marker-trait associations for the considered traits. UM = unmapped. The red line represents the significance threshold of 0.05 adjusted for the False Discovery Rate with the Benjamini–Hochberg test (see Materials and Methods for details).

Manhattan plots showing the mapped marker-trait associations for the considered traits. UM = unmapped. The red line represents the significance threshold of 0.05 adjusted for the False Discovery Rate with the Benjamini–Hochberg test (see Materials and Methods for details). A total number of 64 markers with a reliable position on the T. aestivum genome (“mapped”) were associated with the analyzed traits, and defined 37 regions (Figure 3, Table 3), which were identified as trait acronym.chr number/region number. Furthermore, 274 associated but “unmapped” markers were identified (Supplementary Table S3).
Table 3

 Results of the association analysis: mapped marker-trait associations (MTAs)

Results of the genome-wide association mapping analysis
Trait categoryTraitMTA IDPhysical position (bp)a
Peak marker
Chr.StartEndSizeIDPositionAllelesMAF R 2 b
MorphologyAWNAWN.1/011549,283,004566,163,94116,880,937SIL_1218444566,163,941pres/abs6.00%0.17282
MorphologyAWNAWN.5/015473,321,250551,446,71278,125,462SIL_1018217551,446,712pres/abs10.26%0.11534
MorphologyAWNAWN.7/01741,818,19941,877,67159,472SNP_357421741,877,67166:A>G17.95%0.13713
MorphologyGLCGLC.1/011SIL_44099431,846,753pres/abs28.66%0.09793
MorphologyGLCGLC.1/021492,800,956530,809,59338,008,637SNP_2243185492,800,95658:T>C5.16%0.12221
MorphologyGLCGLC.5/015SIL_1281357430,959,218pres/abs8.90%0.09406
MorphologyGLCGLC.6/016SIL_1202445611,334,867pres/abs10.19%0.08933
MorphologyGLCGLC.7/017SIL_1067890724,946,619pres/abs8.86%0.08346
MorphologyGLHGLH.1/011SIL_17236781,674,029pres/abs25.32%0.10053
MorphologyGLHGLH.1/021423,932,005584,961,270161,029,265SNP_100057721537,093,89925:G>C5.03%0.16706
MorphologyGLHGLH.3/013SIL_1267588743,193,869pres/abs8.13%0.08229
MorphologyGLHGLH.4_down/014_down33,622,13134,765,9491,143,818SIL_120293833,646,546pres/abs5.00%0.106
MorphologyGLHGLH.5/015SIL_979056135,162,954pres/abs5.77%0.0956
MorphologyGLHGLH.5/025SIL_1122236686,095,861pres/abs6.25%0.09506
MorphologyGLHGLH.6/016596,658,016603,182,5596,524,543SNP_1190777603,182,55922:A>G5.73%0.1403
MorphologyGLHGLH.7/017592,905,438619,036,44426,131,006SNP_3572864619,036,44415:T>C5.77%0.12362
MorphologyGRTGRT.5/015SIL_2247447665,892,637pres/abs38.82%0.09051
MorphologyGRWGRW.6/016SIL_3572989585,787,274pres/abs38.16%0.09269
MorphologyGRWGRW.7/017SIL_108167920,582,384pres/abs13.92%0.08641
MorphologyPLHPLH.1/011SIL_10926301,673,894pres/abs6.96%0.11749
MorphologyPLHPLH.3/013621,592,828658,048,75536,455,927SNP_2293071649,815,96168:G>A6.29%0.09278
MorphologyPLHPLH.5/015395,852,119461,901,24066,049,121SIL_1204105395,852,119pres/abs23.42%0.12161
MorphologyPLHPLH.7/017SNP_994119545,935,05620:T>C5.66%0.08454
MorphologySPSSPS.5/015SIL_106309536,199,250pres/abs16.35%0.09316
MorphologySPSSPS.7/017SNP_996521464,570,00316:A>G5.03%0.15418
PhenologyHDAHDA.2_up/012_upSNP_3949514501,317,54259:C>G35.44%0.08451
PhenologyHDAHDA.7/017SIL_5003498115,960,770pres/abs15.65%0.14264
PhenologyHDAHDA.7/027SIL_5006335571,054,862pres/abs38.71%0.09722
QualityCARCAR.3/013483,483,183483,483,953770SNP_981918483,483,18319:G>A9.32%0.11867
QualityCARCAR.6/016SIL_4994928572,312,243pres/abs23.33%0.10336
QualityPROPRO.1/011SIL_3575496266,740,723pres/abs6.29%0.08065
QualityPROPRO.3/013SIL_1125220608,744,656pres/abs37.34%0.09107
QualityPROPRO.7/01741,818,19941,877,67159,472SNP_357421741,877,67166:A>G17.95%0.1077
QualitySDSSDS.1/011SNP_100018797568,008,9075:A>T6.21%0.08513
QualitySDSSDS.5/015SNP_1025773665,609,70027:A>G5.19%0.08069
QualitySDSSDS.6/016SIL_357569325,706,107pres/abs10.76%0.09259
QualitySDSSDS.6/026SNP_1088712593,787,61244:A>G5.66%0.09513

On IWGSC RefSeq v1.0.

Phenotypic variance explained by the peak marker.

Results of the association analysis: mapped marker-trait associations (MTAs) On IWGSC RefSeq v1.0. Phenotypic variance explained by the peak marker. For morphology-related traits (AWN, GLC, GLH, GRL, GRW, PLH, and SPS), the analysis detected 25 associated regions with a defined position on the bread wheat genome. The percentage of variance explained by the peak markers (r2) ranged between 8.23% and 17.28%. GLH was the trait with the highest number of MTAs (eight mapped regions). The GWAS on the phenology-related HDA resulted in three associated mapped regions (variance explained ranging between 8.45% and 14.26%), while for quality traits (CAR, SDS, and PRO) the analysis yielded in total nine associated regions (two for CAR, four for SDS, and three for PRO). The variance explained ranged between 7.8% and 15.7%. For yield-related traits, one unmapped marker (SIL_3028492) resulted associated to TGW, with an explained variance of 9.0% (Supplementary Table S4_c). Physically overlapping regions associated to different traits were also detected, e. g. AWN.7/01 and PRO.7/01 completely overlapped, also sharing the same peak marker (SNP_3574217), while GLC.5/01 partially overlapped with PLH.5/01.

Discussion

The renewed interest for cultivated einkorn, for millennia cropped in isolated areas of Middle East, Caucasia, and Europe, is centered on the nutritional qualities of its grain (Hidalgo and Brandolini 2019), the high resistance to pests and diseases (Zaharieva and Monneveux 2014; Serfling ; Chen ), and its adaptation to harsh, low-input environments. Low yields, glumes persistence after harvest (hulledness) and a limited knowledge of its diversity explain why this crop is not properly exploited (Zaharieva and Monneveux 2014). Nevertheless, einkorn landraces were selected and grown in isolated environments and were not subjected to the “bottleneck” effects of intensive selection, thus even today display a broad range of variation for many traits. The panel of landraces used in this work was recently characterized from the phenotypic point of view (Volante et al. 2020); a deeper investigation of the genetic bases underlying such diversity may unravel valuable genes, potentially useful for wheat improvement. Linkage and association mapping rely necessarily on the availability of a great number of molecular markers with known genetic or physical position; a high-resolution einkorn genetic map was recently published (Marino et al. 2018), but only few markers (849) are in common with those included in the panel used in this work. By exploiting the high co-linearity of the genomes of T. monococcum and T. urartu (Marino ), the latter being the donor of the A genome to bread wheat, we aligned the sequences of the DArT markers from a panel previously developed and described (Brandolini ) on the annotated genome of bread wheat (IWGSC, 2018), being this latter the most exhaustive and comprehensive genome available for wheats so far. In such a way, a physical position could be approximated for 2037 out of the 12,734 markers used for the GWAS analysis (corresponding to 16% of the panel). We decided to use very stringent parameters for the BLAST analysis, despite knowing that we would lose most of the alignments, but due to the genetic distance between the two species (Hyun ), we wanted to be as rigorous as possible and to give very robust physical localizations. In the exploitation of T. urartu genome the percentage of physically mapped markers was lower (13%: 1673) but the marker order along chromosomes of einkorn physical map was conserved with that of T. aestivum genome (Supplementary Table S4_a). These observations support our choice to use the bread wheat genome as the most appropriate tool to give a physical order to einkorn markers. The association analysis, corrected for the population structure, was therefore undertaken using the markers thus physically mapped, together with those with unknown position. The pairwise LD (r2) observed in the panel decayed to values lower than 0.2 (generally assumed as a r2 value between unlinked loci) at 176.11 Kbp (as an average value across all the chromosomes). Such an r2 is notably small, compared to what is generally observed in wheat species such as bread wheat (8 Mbp as an average among A, B, and D genome; Liu ), and durum wheat (from 1.6 Mb to 4.5 Mb; Maccaferri ), while for rice similar values have been observed (150 Kb to 2 Mb for the japonica group; Courtois ; Kumar ). Strongly selected crops typically show high levels of LD, due to increased correlation among alleles at specific loci (Schlötterer 2003); conversely, a short decay can be expected in a panel including landraces, which usually have a long history of isolated cultivation and were not or were poorly subjected to breeding. Accordingly, Hao detected a lower LD in bread wheat landraces compared to modern cultivars; therefore, the results observed are not surprising because einkorn has a completely different history of domestication from “more successful” crops. However, it is remarkable that a significant percentage of the accessions used in this work (26.7%) were not assigned to a specific cluster by the STRUCTURE analysis, suggesting that a certain percentage of admixture was present. The marker set used in this work (composed of 12,734 DArTseq) allowed to identify a high number of loci associated to the traits analyzed. We investigated whether the significantly associated regions (the interval defined by the peak marker ± 176 kbp, according to the extent of the LD decay) found in this study were located at the same or close physical position of previously identified genes or QTL with a known function. The regions HDA.07/01 identified by marker SIL_5003498 (115.9 Mb) and HDA.07/02 identified by SIL_5006335 (571.1 Mb) were consistent respectively with two QTL located on Chromosome 7 at 119.5 Mb (AX-94419768) and at 576.1–581.2 Mb (qHD7A.4) reported previously in bread wheat (Turuspekov ; Pang ) (Table 4). For plant height, the region PLH.07/01 detected by SNP_994119 on Chromosome 7 at 545.9 Mb was previously identified by Pang as qPH7A.5 (3.6-3.7 Mb) (Table 4).
Table 4

 Comparison between the marker-trait associations (MTAs) detected in this work and QTL from literature

Correspondence of MTAs detected in this work with QTLs from literature
TraitMTA IDPeak markerChromosomePosition (bp)aQTLPositionDistance (Mb)Reference
HDA HDA.7/01 SIL_5003498 7 115,960,770 AX-94419768 119.5 Mb 3.6 Turuspekov et al. ( 2017)
HDA HDA.7/02 SIL_5006335 7 571,054,862 qHD7A.4 576.1–581.2 Mb 6 Pang et al. ( 2020)
PLH PLH.7/01 SNP_994119 7 545,935,056 qPH7A.5 539.9–540 Mb 5 Pang et al. ( 2020)
GLHGLH.1/02SNP_1000577211537,093,899AX-94618537511.1 Mb26 Sheoran et al. (2019)
SPSSPS.7/01SNP_9965217464,570,003qSNS7A.2433.3–433.6 Mb31 Pang et al. (2020)
SDSSDS.5/01SNP_10257735665,609,700AX-94927055697.7 Mb32.1 Yang et al. (2020)
PROPRO.3/01SIL_11252203608,744,656AX-95629522649.7 Mb41 Yang et al. (2020)

For each known QTL considered, the position and distance from the corresponding MTA are reported. Boldface MTAs are located at <10Mb from the cited QTL reported in literature.

On IWGSC RefSeq v1.0.

Comparison between the marker-trait associations (MTAs) detected in this work and QTL from literature For each known QTL considered, the position and distance from the corresponding MTA are reported. Boldface MTAs are located at <10Mb from the cited QTL reported in literature. On IWGSC RefSeq v1.0. A survey of the literature suggests that the regions identified on chromosomes 3 (608.8 Mb) PRO.03/01 and 5 (665.6 Mb) SDS.05/01 are in proximity of (although not overlapping) those identified by Yang in very recent multi-locus GWAS (at 649.7 Mb for grain protein content—AX-95629522; and at 697.7 Mb for SDS-sedimentation volume—AX-94927055) (Table 4). The same considerations can be done for GLH on chromosome 1 (537.1 Mb; GLH.01/02) and for SPS on chromosome 7 (464.6 Mb; SPS.07/01) that appear to be in the proximity to regions associated to, respectively, glume pubescence at 511.1 Mb—AX-94618537 (Sheoran ) and spike/spikelet number at 433.3 Mb (Pang ) (Table 4). The two regions identified for AWN on chromosome 5 (522.1 and 551.4 Mb) are definitely not included into the interval of B1 gene recently cloned (from 681.5 to 706.8 Mb; DeWitt ). Since the remaining regions detected in this study were not reported in previous GWAS works, we hypothesize that most of the MTAs observed in this study, and shown in Table 3, define new and undescribed regions controlling the einkorn traits analyzed. Moreover, several markers, including both mapped and un-mapped, resulted associated to multiple traits (Supplementary Table S4_d), suggesting possible pleiotropic effects of the associated QTL. It would be interesting to investigate deeper these overlapping, especially HDA-SPS and PLH-SPL, trait-association already described in bread wheat (Chai ; Chen ) Overall, these observations deserve further investigations and could be exploited both in the breeding of this crop and to transfer valuable alleles into more widespread species such as bread wheat.

Conclusion

An approach based on co-linearity with bread wheat was adopted to achieve a set of einkorn molecular markers with an inferred physical position; these, together with the phenotypic information from a 4 years field trial experiment, allowed to undertake an association mapping survey on a species poorly characterized from the genetic point of view. Indeed, this is the first report of a GWAS approach on einkorn using a high number of molecular markers and an extensive panel of landraces. The analyses yielded several einkorn genome regions putatively involved in the regulation of phenological, morphological, quality, and production-related traits, which deserve a deeper characterization from the functional point of view. For seven associated regions, we also found confirmation of our results in the literature on bread wheat, despite the remarkable genetic distance between the two species, and this further validates the robustness of the method. Starting from this information, a set of specific markers such as Cleaved Amplified Polymorphic Sequences (CAPSs), or markers amenable for automation and high-throughput approaches (such as Kompetitive Allele Specific PCR: KASPar markers) may be developed and used as valuable tools for the marker-assisted breeding of einkorn wheat. Unfortunately, no consideration can be done for many other markers, despite significant association P-values, because they were not mapped on the bread wheat genome, and only few could be positioned either on T. urartu genome (18) or einkorn contigs (48) (Supplementary Table S4_c). Nevertheless, it should be kept in mind that we adopted very stringent mapping criteria, which allowed a very accurate positioning of the markers but, on the other hand, led to define the most part of these as “unmapped”. Despite this, we found several markers influencing multiple traits (e.g., heading date with number of spikelets per spike, and spike length with plant height), and these co-regulation mechanisms found support in literature further validating our analysis. Moreover, when the mapping criteria were loosened, further associations could be mapped and found consistency with literature (data not presented). The alignments performed to the three species let us to identify 3290 einkorn contigs linked to 3655 markers, out of them 1166 and 1412 were physically mapped on T. urartu and T. aestivum genomes, respectively. These results revealed a possible future strategy for ordering and orienting the einkorn contigs currently available but not yet anchored: the results of the BLAST alignments (maybe loosening selection criteria) performed for the whole marker dataset (also including those not passing the callrate and minor allele frequency thresholds adopted for the association mapping) coupled with the available genetic information could be exploited for this purpose. This would boost the molecular study of still unexplored traits and unlock the efficient use of einkorn as a valuable repertoire of genetic alleles for the breeding of modern crops.

Authors contributions

A.V. performed the analysis of population structure and GWAS, wrote the first manuscript draft and managed the revision process. D.B. performed the blast of the DArT markers against the wheat genomes (T. aestivum, T. urartu, and T. monococcum) and contributed to manuscript writing. A.V. and D.B. investigated the novelty of the MTAs identified. R.M. contributed to the statistical analyses, to writing and critically revising the manuscript. A.B. conceived and supervised the work, managed in-field trials and phenotyping and revised the manuscript. All the authors read and approved the final manuscript. A.V. and D.B. contributed equally to the paper.

Code availability (software application or custom code)

The original codes cited in the reference, when specified, were used.

Data availability

Supplemental files available at figshare: https://doi.org/10.25387/g3.14975118. Table S1 includes the list of the accessions tested, their origin and the cluster assigned by the Structure analysis. Table S2 shows the physical position on bread wheat genome (IWGSC RefSeq V1.0) assigned to DArT markers of Triticum monococcum. Table S3 shows the results of the Genome-Wide Association Mapping analysis for markers unmapped on bread wheat genome. Table S4 shows the physical position of DArT markers on T. urartu genome (a; GCA_003073225.1) and T. monococcum contigs (b; TGAC WGS monococcum v1). In the third sheet (c), the physical position of the peak marker-trait associations (both mapped on bread wheat genome and unmapped) is reported. In the fourth sheet (d), markers associated to multiple traits are listed together with their physical position on the three considered genomes. File S1 reports the Analysis Of Variance (ANOVA) of the phenotypic data. File S2 contains the genotypic data (DArTseq) of the einkorn panel used for the analysis of population structure, analysis of linkage disequilibrium decay and association mapping. Only the markers passing the callrate (90%) and Minor Allele Frequency (5%) thresholds are shown. File S3 shows the Manhattan- and QQ plot resulting from the association analyses for all the considered traits.
  54 in total

1.  Geographic differentiation of domesticated einkorn wheat and possible Neolithic migration routes.

Authors:  A Brandolini; A Volante; M Heun
Journal:  Heredity (Edinb)       Date:  2016-05-11       Impact factor: 3.821

2.  Molecular linkage map of Einkorn wheat: mapping of storage-protein and soft-glume genes and bread-making quality QTLs.

Authors:  B Taenzler; R F Esposti; P Vaccino; A Brandolini; S Effgen; M Heun; R Schäfer-Pregl; B Borghi; F Salamini
Journal:  Genet Res       Date:  2002-10       Impact factor: 1.588

3.  Identification of variation in adaptively important traits and genome-wide analysis of trait-marker associations in Triticum monococcum.

Authors:  Hai-Chun Jing; Dmitry Kornyukhin; Kostya Kanyuka; Simon Orford; Anastasiya Zlatska; Olga P Mitrofanova; Robert Koebner; Kim Hammond-Kosack
Journal:  J Exp Bot       Date:  2007       Impact factor: 6.992

4.  Dissection of two quantitative trait loci with pleiotropic effects on plant height and spike length linked in coupling phase on the short arm of chromosome 2D of common wheat (Triticum aestivum L.).

Authors:  Lingling Chai; Zhaoyan Chen; Ruolin Bian; Huijie Zhai; Xuejiao Cheng; Huiru Peng; Yingyin Yao; Zhaorong Hu; Mingming Xin; Weilong Guo; Qixin Sun; Aiju Zhao; Zhongfu Ni
Journal:  Theor Appl Genet       Date:  2018-09-28       Impact factor: 5.699

5.  Exploring and Mobilizing the Gene Bank Biodiversity for Wheat Improvement.

Authors:  Deepmala Sehgal; Prashant Vikram; Carolina Paola Sansaloni; Cynthia Ortiz; Carolina Saint Pierre; Thomas Payne; Marc Ellis; Ahmed Amri; César Daniel Petroli; Peter Wenzl; Sukhwinder Singh
Journal:  PLoS One       Date:  2015-07-15       Impact factor: 3.240

6.  GWAS for plant growth stages and yield components in spring wheat (Triticum aestivum L.) harvested in three regions of Kazakhstan.

Authors:  Yerlan Turuspekov; Aida Baibulatova; Kanat Yermekbayev; Laura Tokhetova; Vladimir Chudinov; Grigoriy Sereda; Martin Ganal; Simon Griffiths; Saule Abugalieva
Journal:  BMC Plant Biol       Date:  2017-11-14       Impact factor: 4.215

7.  Genome-Wide Association Studies in Diverse Spring Wheat Panel for Stripe, Stem, and Leaf Rust Resistance.

Authors:  Deepender Kumar; Animesh Kumar; Vinod Chhokar; Om Prakash Gangwar; Subhash Chander Bhardwaj; M Sivasamy; S V Sai Prasad; T L Prakasha; Hanif Khan; Rajender Singh; Pradeep Sharma; Sonia Sheoran; Mir Asif Iquebal; Sarika Jaiswal; Ulavappa B Angadi; Gyanendra Singh; Anil Rai; Gyanendra Pratap Singh; Dinesh Kumar; Ratan Tiwari
Journal:  Front Plant Sci       Date:  2020-06-03       Impact factor: 5.753

8.  Genome-wide association mapping of root traits in a japonica rice panel.

Authors:  Brigitte Courtois; Alain Audebert; Audrey Dardou; Sandrine Roques; Thaura Ghneim-Herrera; Gaëtan Droc; Julien Frouin; Lauriane Rouan; Eric Gozé; Andrzej Kilian; Nourollah Ahmadi; Michael Dingkuhn
Journal:  PLoS One       Date:  2013-11-05       Impact factor: 3.240

9.  Mapping and characterization of wheat stem rust resistance genes SrTm5 and Sr60 from Triticum monococcum.

Authors:  Shisheng Chen; Yan Guo; Jordan Briggs; Felix Dubach; Shiaoman Chao; Wenjun Zhang; Matthew N Rouse; Jorge Dubcovsky
Journal:  Theor Appl Genet       Date:  2017-11-21       Impact factor: 5.699

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.