Literature DB >> 35657639

Genomic consequences of artificial selection during early domestication of a wood fibre crop.

Marja M Mostert-O'Neill¹, Hannah Tate¹, S Melissa Reynolds¹, Makobatjatji M Mphahlele^1,2, Gert van den Berg³, Steve D Verryn⁴, Juan J Acosta⁵, Justin O Borevitz⁶, Alexander A Myburg¹.

Abstract

From its origins in Australia, Eucalyptus grandis has spread to every continent, except Antarctica, as a wood crop. It has been cultivated and bred for over 100 yr in places such as South Africa. Unlike most annual crops and fruit trees, domestication of E. grandis is still in its infancy, representing a unique opportunity to interrogate the genomic consequences of artificial selection early in the domestication process. To determine how a century of artificial selection has changed the genome of E. grandis, we generated single nucleotide polymorphism genotypes for 1080 individuals from three advanced South African breeding programmes using the EUChip60K chip, and investigated population structure and genome-wide differentiation patterns relative to wild progenitors. Breeding and wild populations appeared genetically distinct. We found genomic evidence of evolutionary processes known to have occurred in other plant domesticates, including interspecific introgression and intraspecific infusion from wild material. Furthermore, we found genomic regions with increased linkage disequilibrium and genetic differentiation, putatively representing early soft sweeps of selection. This is, to our knowledge, the first study of genomic signatures of domestication in a timber species looking beyond the first few generations of cultivation. Our findings highlight the importance of intra- and interspecific hybridization during early domestication.

Entities: Chemical

Keywords: artificial selection; domestication; eucalypt; forestry; population genomics; selection signatures

Mesh：

Year: 2022 PMID： 35657639 PMCID： PMC9541791 DOI： 10.1111/nph.18297

Source DB: PubMed Journal: New Phytol ISSN： 0028-646X Impact factor: 10.323

Introduction

Understanding changes in genomic architectures underlying the domestication of plants aids in the discovery of genetic targets for crop improvement and enhances our knowledge of the evolutionary forces involved in species adaptation (Ross‐Ibarra et al., 2007; Purugganan & Fuller, 2009; Olsen & Wendel, 2013). For most domesticates, the genotypes intermediate between wild and domesticate are missing. In some cases, even the wild progenitors remain disputed (Cornille et al., 2012; Wu et al., 2014), complicating efforts to untangle the evolutionary forces that shaped the genomes of domesticates and to detect genomic signatures of artificial selection. Current breeding practices in plantation forestry (Isik et al., 2015) mimic that of early fruit and annual crop domestication, including exploitation of interspecific hybrids (Wu et al., 2014), genetic infusions (intentional introduction of unrelated genetic diversity from the same species) from wild, unimproved genotypes (Cornille et al., 2012; Hufford et al., 2012), and vegetative propagation of favourable genetic combinations (Myles et al., 2011; Cornille et al., 2014). In addition to traits associated with general plant domestication syndrome such as determinate growth with reduced branching and reallocation of resources to the harvested parts of the plant (Ross‐Ibarra et al., 2007), forest tree domestication could also include changes in wood properties, such as wood density and wood chemistry, where breeders directly selected for such traits (Tuskan, 2007; Thomas et al., 2018). Most cultivated forestry species are fewer than three generations removed from their wild progenitors. As such, genetic investigations have focused on early responses to cultivation (Jones et al., 2006; Bouffier et al., 2008; Varghese et al., 2009; De La Torre et al., 2014; Skrøppa & Steffenrem, 2016) or genomic responses to natural selection (Prunier et al., 2011; Evans et al., 2014; Acosta et al., 2019; Collevatti et al., 2019; Wang et al., 2020). An exception is the domestication of Eucalyptus grandis, a forestry species that has been grown and bred ex situ for over a century, representing a unique opportunity to observe the genomic consequences of ongoing formal and informal artificial selection early in the domestication process. Cultivation of E. grandis as a timber and wood fibre crop has been ongoing for over 100 yr in various exotic environments around the world (Bennett, 2011). From its origins in Australia, the species has been transplanted to every continent except Antarctica (Marco, 1991; Rockwood & Meskimen, 1991; Huoran et al., 1992; Chaix et al., 2003; Hunde et al., 2003; Dos Santos et al., 2004; Verryn et al., 2009; Luo et al., 2010; Boulay et al., 2012; Santos et al., 2017). Its fast growth has been further improved in exotic breeding programmes where artificial selection resulted in trees reaching harvestable age 10–15% earlier (Verryn, 2002), and produced increases of 16% in stem volume per generation of breeding (Meskimen, 1983). These improvements in growth resulted, in part, from the selection of genotypes better adapted to the exotic environment (Rockwood & Meskimen, 1991) and an expanded range of genotypes produced by intraspecific hybridization resulting from crosses between individuals from different provenances. Other economically important traits such as stem form and wood properties were also improved by artificial selection (Verryn et al., 2009). Quantitative genetics studies of early E. grandis breeding trials have therefore indicated substantial genetic gains for production phenotypes, but it is not clear how these genetic gains have manifested in the genomes of these trees. As reviewed by Ross‐Ibarra et al. (2007), Purugganan & Fuller (2009) and Olsen & Wendel (2013), most domestication studies use one of two broad strategies to identify candidate genetic variants that can subsequently be used for functional verification of their role in domestication traits and/or targeted for crop improvement. The first involves quantitative trait locus (QTL) mapping or genome‐wide association studies to identify genomic regions associated with a trait of economic importance. Published examples of these so‐called top‐down studies (starting with the trait to identify underlying genes) in E. grandis and its hybrids include the detection of QTLs associated with vegetative propagation (Grattapaglia et al., 1995; Marques et al., 2002), growth and wood properties (Grattapaglia et al., 1996; Rocha et al., 2007; Kullan et al., 2012), and resistance to pests and pathogens (Alves et al., 2012; Mhoswa et al., 2020). These strategies generally detect large‐effect loci segregating in a particular family. To aggregate the genetic variation of genome‐wide small effects underlying quantitative phenotypes, genomic selection has also been used in the species (Mphahlele et al., 2020). Complementing this, the second strategy starts by comparing genome‐wide patterns of genetic diversity and differentiation among and between domesticated and wild progenitor populations to identify regions of the genome that show signatures of selection (Ross‐Ibarra et al., 2007; Purugganan & Fuller, 2009; Olsen & Wendel, 2013). Gene Ontology (GO) terms (Ashburner et al., 2000) associated with genes within these regions can subsequently reveal the biological processes under artificial selection. Since this bottom‐up strategy is phenotype‐naïve, it could also reveal traits that have been selected unintentionally. As reviewed by Cutter & Payseur (2013), the number of genes underlying the selected traits, strength of selection on individual loci, recombination rates and number of generations determine our ability to decipher the genomic footprints left by recurrent selection. Furthermore, this strategy requires extensive genomics resources including genome‐wide genotyping tools and an annotated reference genome for the identification of genes in linkage with genomic loci under selection. The economic importance of E. grandis as a wood fibre crop, as a pure species or as a hybrid partner, has led to the development of numerous transcriptomic (Mangwanda et al., 2015; Oates et al., 2015; Vining et al., 2015) and genomic resources, including annotated nuclear and organellar genome sequences (Myburg et al., 2014; Bartholomé et al., 2015; Pinard et al., 2019b) and a high‐throughput EUChip60K single nucleotide polymorphism (SNP) array (Silva‐Junior et al., 2015). This raises the possibility of combining trait‐based gene discovery efforts with a bottom‐up approach to uncover genomic regions under artificial selection in the genomes of E. grandis individuals in early domestication. South Africa has some of the most advanced E. grandis breeding programmes globally. In this study, we investigate the genomic consequences of a century of cultivation in three such programmes. The three populations share a common gene pool, which originated from multiple seed imports from Australia starting as early as 1896 (Bennett, 2011). Formal breeding programmes, where growth and sawn timber quality were under selection, commenced in the 1960s with dedicated provenance trials using newly imported seed lots from across most of the latitudinal range of the species (Poynton, 1979), and material that has been selected and advanced informally in South Africa for up to five generations previously. These programmes followed tree improvement methodologies later described by Zobel & Talbert (1984), with an average breeding age of 8 yr (although E. grandis can flower from as young as 3 yr depending on field conditions). The breeding objective was to make genetic gains whilst maintaining genetic diversity (since forest trees have high genetic load and suffer inbreeding depression), and as such, the top‐performing individual(s) for each family were advanced and on average 300 families were maintained and selected based on family means within and across sites. In the process, some families would not go forward in the breeding programme. Since the 1990s, this germplasm was advanced for three to five generations in separate private breeding programmes by forestry companies Hans Merensky, Mondi and Sappi (Fig. 1a).

Fig. 1

Breeding Eucalyptus grandis genetic differentiation and population structure relative to wild progenitors and potential introgressing species. (a) Diagram of the plantation and breeding history of the three South African E. grandis populations, TZA, ZUL and KZN, with main end‐product (turquoise shade) and known biotic challenges (pale yellow shade) given below, and sources of genetic change (breeding practices, intentional genetic infusions and unintentional introgression) given above the main timeline. 1Bennett (2011); 2Van Wyk & Roeder (1978); 3Denison & Kietzka (1993); 4Wingfield et al. (2008). (b) Discriminant analysis of principal components (DAPC; see Supporting Information Fig. S1 for supporting BIC plot) at K = 7 with two dimensions shown (24 306 informative single nucleotide polymorphisms (SNPs) used) of core (cluster 1), infused (cluster 2) and introgressed (cluster 6) breeding E. grandis, and North (cluster 7), South (cluster 2) and Mackay (cluster 4) wild subpopulations. Cluster 5 contained other species that could potentially introgress with breeding E. grandis, including E. urophylla, E. saligna and E. grandis × E. urophylla (GU) hybrids as obtained from Silva‐Junior et al. (2015). (c) Population structure principal components analysis plot for the first three principal components (eigenvalues given in parentheses) of all breeding E. grandis and wild progenitor subpopulations (23 661 informative SNPs used, see Fig. S2 for supporting scree plot and https://chart‐studio.plotly.com/~Marja/125/#/ for an interactive version), excluding species that could potentially introgress in breeding populations. (d) DAPC analysis at K = 2 of all breeding E. grandis (excluding introgressed individuals), and Northern and Southern wild subpopulations, used for identification of infused breeding individuals (23 661 informative SNPs used). [Colour figure can be viewed at wileyonlinelibrary.com] First, we aim to test the hypothesis that a century of domestication has resulted in E. grandis genotypes that are genetically distinct from their wild progenitors. We also investigate the possibility that interspecific hybridization and recent infusions from unimproved, wild material have contributed to the genetic diversity in South African breeding populations, as is suggested to have occurred in the domestication of other crops (He et al., 2011; Myles et al., 2011; Cornille et al., 2012; Wu et al., 2014; Baute et al., 2015). This is done by elucidating the population structure and genetic differentiation of breeding populations relative to wild E. grandis populations (Mostert‐O’Neill et al., 2021) and species with which E. grandis could have hybridized ex situ (Silva‐Junior et al., 2015). Next, we define the core E. grandis breeding germplasm, representative of the advanced‐generation population that has been under selection for a century. In these trees, we detect genomic regions that show potential signatures of selection. Variants that exhibit localized differentiation patterns between breeding and wild populations are identified. We also compare genome‐wide patterns of heterozygosity and linkage disequilibrium (LD) in breeding material to that in wild progenitors as support for potential signatures of selection.

Materials and Methods

Study population, SNP genotyping and population structure

Individuals were sampled from three core, open‐pollination breeding programmes (Table 1). The first population, TZA, consisted of 285 fifth‐generation (since privatization in the 1990s) individuals, representing 282 families, which were bred in the temperate Tzaneen area of the Mpumalanga province. The second population, ZUL, represented by 43 families with 248 third‐generation individuals, was bred for subtropical climates in Zululand, in northern KwaZulu‐Natal province. The KZN population consisted of a core breeding population of 547 third‐ and fourth‐generation individuals (62 families) established from trees bred in temperate and subtropical sites in KwaZulu‐Natal. DNA was isolated from leaf or cambial tissues using the Nucleospin DNA extraction kit (Machery‐Nagel, Düren, Germany) and used for SNP genotyping with the EUChip60K chip (Silva‐Junior et al., 2015). Genotypic classes were redefined as described by Silva‐Junior et al. (2015) and informative SNPs (unique map position on v.2 reference genome assembly, minor allele frequency (MAF) > 0.02 and genotyped in at least 90% of individuals) were extracted using the SNP & Variation Suite™ v.8.x (Svs8; Golden Helix Inc., Bozeman, MT, USA). Samples were also interrogated to ensure that at least 90% of informative markers were successfully genotyped in all individuals. Identity by descent analysis in Svs8 (Identity by Descent Estimation. SNP & Variation Suite Manual v.8.x; Golden Helix) was used to confirm half‐sib relationships and to remove full‐sib individuals from over‐represented families. Only one such family from KZN was identified with nine putative full‐sibling individuals, of which only one was retained for subsequent analysis (results not shown).

Table 1

Study populations and collection sites.

Breeding population	Number of families	Number of individuals	Site name	Latitude	Longitude	Elevation (m) ^†	MAP (mm)	MAT (°C)
TZA	284	287	Rooikoppies	−23.80	30.10	826	965	20
ZUL	43	285	Palm Ridge	−28.32	32.26	60	900	22
KZN	62	208	Siya Qubeka	−28.65	32.15	76	1196	21
		167	Nyalazi	−28.21	32.35	52	999	21
		185	Mtunzini	−29.03	31.66	84	1220	21

TZA, ZUL and KZN, South African Eucalyptus grandis populations.

MAT, mean annual temperature; MAP, mean annual precipitation.

Elevation was determined based on GPS coordinates using the online resource, MAPS.ie (https://www.maps.ie/coordinates.html).

Study populations and collection sites. TZA, ZUL and KZN, South African Eucalyptus grandis populations. MAT, mean annual temperature; MAP, mean annual precipitation. Elevation was determined based on GPS coordinates using the online resource, MAPS.ie (https://www.maps.ie/coordinates.html). Population differentiation patterns were investigated and compared among breeding populations, and between breeding populations, wild E. grandis (including 362 individuals from three subpopulations; Mostert‐O'Neill et al., 2021) and other Latoangulatae species as published by Silva‐Junior et al. (2015) using four approaches: principal component analysis (PCA) with normalization to each marker’s standard deviation in Svs8; sparse nonnegative matrix factorization (sNMF) using the lea R package (Frichot et al., 2014; Frichot & François, 2015) – the values for K tested were K = 2 to K = 10 with five repetitions of each value and the minimum cross‐entropy (CE) was determined for each value of K and visualized; discriminant analysis of principal components (DAPC) using the adegenet R package (Jombart, 2008; Jombart et al., 2010) with Bayesian information criterion (BIC) used to determine the most probable cluster number in the data set with K = 1 to K = 15 tested; and the extent of differentiation among breeding populations, and between breeding and wild E. grandis populations was quantified as F‐statistics, F ST, as described by Weir & Cockerham (1984), with 95% confidence intervals in Svs8. Recent introgression, as a consequence of interspecific hybridization, can confound the detection of genomic segments under selection. To detect introgression in South African breeding programmes, population structure was investigated using PCA, sNMF and DAPC with the inclusion of published SNP genotypic data (Silva‐Junior et al., 2015) for other species within the section Latoangulatae (10 E. saligna, 19 E. urophylla and 16 E. grandis × E. urophylla (GU) hybrids). Suspected introgression was further tested by interrogating individual genotypes for the presence of genomic segments not originating from E. grandis by ancestry mapping using the Efficient Inference of Local Ancestry (eila) R package (Yang et al., 2013) as described by Mostert‐O’Neill et al. (2021) with the breakpoint penalty λ = 30. Briefly, probable ancestry was calculated for each SNP using the same three reference populations as described by Mostert‐O'Neill et al. (E. grandis, non‐E. grandis Latoangulatae or Maidenaria‐like). Next, the cumulative probability estimates of all SNPs on a chromosomal segment were used to assign each segment to one of the three ancestral populations. The penalty for allowing breakpoints (λ), where ancestry switches from one ancestral population to another along a chromosomal segment, was previously optimized based on ancestry mapping conducted in pure species and hybrid individuals for which the ancestry was known (Mostert‐O’Neill et al., 2021). This approach allowed for the identification of even small introgressed genomic segments, which could confound the detection of genomic regions selected during early domestication. For ancestry mapping, SNPs with zero missing data were used. Individuals with evidence of introgression (presence of genomic segments assigning to non‐E. grandis ancestry) were excluded from subsequent analyses.

Differentiation between wild and breeding populations

To detect genomic regions selected during early domestication, a core breeding population, representing individuals that were differentiated from the wild subpopulation, was identified using DAPC. This also allowed for the detection of individuals that probably shared a more recent ancestry with wild progenitors because of genetic infusions (introduction of wild, unimproved germplasm). Since there was no genetic evidence or historical records indicating that Mackay provenances, previously shown to be genetically distinct with evidence of natural interspecific introgression (see fig. 1 in Mostert‐O'Neill et al., 2021), were ever introduced to South Africa, the Mackay subpopulation was not included as wild progenitors. Based on the BIC results, DAPC was repeated for K = 2 to K = 4, and K = 2 was used to distinguish samples with recent genetic infusion from wild and breeding material. Group membership probabilities were used to detect breeding individuals that had more than 0.05 probabilistic assignment to the wild E. grandis cluster. To compare population structure resulting from the removal of introgressed and infused individuals, analyses using PCA, sNMF, DAPC and F ST estimates were repeated on three data sets. The first was all E. grandis (using 23 661 informative SNPs), and the second was all E. grandis excluding introgressed (using 23 661 informative SNPs), in which introgressed breeding individuals were excluded. The third data set (using 21 991 informative SNPs) contained the North and South wild subpopulations (Mostert‐O'Neill et al., 2021) and core breeding E. grandis with recently infused breeding individuals excluded. The last data set was also used for outlier detection. Genetic diversity statistics, including average heterozygosity and inbreeding coefficients, were calculated for retained core breeding E. grandis using hierfstat v.0.04‐22 (Goudet, 2005).

Chloroplast haplotype diversity in wild and breeding populations

A subset of 361 individuals, representing 175 wild and 186 breeding families (representing introgressed, recently infused and core breeding individuals), were also genotyped using the Axiom™ Euc72K SNP chip through the genomics service provider, Thermo Fisher Scientific (Santa Clara, CA, USA), which allowed genotyping with chloroplast (cp) targeting assays. Of the 175 wild individuals, 14 were not previously genotyped by Mostert‐O’Neill et al. (2021) using the EUChip60K SNP chip (Silva‐Junior et al., 2015) but were instead siblings of previously genotyped individuals. The SNP data were processed using the Axiom™ Analysis Suite (v.3.1 User Guide) and Svs8 to retain cp SNPs that were informative (MAF ≥ 0.05) in at least 95% of the individuals. The informative cp SNP calls were concatenated for each individual to extract the cp haplotypes. Haplotype sequences (concatenated alleles) were exported as Fasta files using Mega X (Kumar et al., 2018) and haplotype networks were analysed following the guidelines of Toparslan et al. (2020) using the pegas R package (Paradis, 2010).

Identification and functional dissection of genomic outliers

Genome‐wide patterns of LD, measured as the squared correlation (R 2) between allelic values at two loci, were determined in Svs8 (LD Pairwise Analysis. SNP & Variation Suite Manual v.8.x. © 2017 Golden Helix) and visualized using LDheatmap (Shin et al., 2006) and Svs8 LD plots for each of the 11 chromosomes, individually. To compare genome‐wide patterns of heterozygosity between breeding and wild populations, Hardy–Weinberg equilibrium (HWE) signed R values, indicative of whether a marker is more homozygous (positive values) or heterozygous (negative values) in the population, were calculated in Svs8 (Signed HWE Correlation R. SNP & Variation Suite Manual v.8.x; Golden Helix) across the breeding and wild populations, and for each population separately. Genomic loci differentiated between wild and breeding E. grandis were identified by comparing allele frequencies of 21 991 SNPs using two approaches: DAPC, to score SNP contributions in differentiating wild and breeding material into K = 2 clusters for each chromosome, separately (Jombart et al., 2010); and marker‐specific F ST estimates as calculated using Svs8 based on the algorithm by Weir & Cockerham (1984). Loci were considered high‐confidence outliers if they were within the 99th percentile of both outlier detection methods. A Wilcoxon signed‐rank test was performed in R (v.3.5.1; R Development Core Team, 2018) to determine whether the mean of the outliers differed significantly from the mean of the remaining SNPs for DAPC SNP contribution scores, marker‐specific F ST and HWE signed R values because loci under directional selection are expected to be more homozygous. Outlier detection results were visualized using Tableau Desktop (Professional Edition ©2020). The breeding population consisted of 514 individuals (after removal of introgressed and recently infused individuals), and the wild progenitors were represented by 317 individuals from the Northern and Southern subpopulations. Next, genes up‐ and downstream of high‐confidence outliers were interrogated for GO functional enrichment against the full SNP‐captured gene set as described by Pinard et al. (2019a) and Mostert‐O’Neill et al. (2021). Two sets of genes, within 2 and 6 kb, were analysed based on the lower and upper estimates of LD decay as determined by Silva‐Junior & Grattapaglia (2015), to account for large variations in genome‐wide LD patterns in the breeding populations. Detailed interrogation of allele and genotype frequencies of outlier SNPs in LD with genes that showed functional enrichment for photosynthesis led us to question whether some SNP probes on the EUChip60K SNP chip had targeted organellar genome sequences in addition to nuclear genome targets. Basic Local Alignment Search Tool for nucleotides (Blastn) analysis (Altschul et al., 1990) was performed for all 57 567 SNP probe sequences that had unique mapping locations in the reference nuclear genome (Myburg et al., 2014; Bartholomé et al., 2015) against the E. grandis plastid and mitochondrial genome sequences (Pinard et al., 2019b). Thereafter, outlier detection and GO enrichment analyses were repeated for 21 938 SNPs, excluding those with potential organellar genome targets. Population structure and differentiation analyses were also repeated with 53 organellar genome‐targeting SNPs excluded, with no noticeable change to the results. Loci putatively under selection were also detected by a multivariate approach using the pcadapt R package (Luu et al., 2017). This approach did not require predefined grouping of individuals, as in the case of DAPC SNP contribution scores and F ST estimates. Instead, outliers were identified, for each chromosome separately, based on the Mahalanobis distance test statistic as differentiated from allele frequencies correlated with the first two principal components (K = 2) in a population structure PCA. Control for false discovery rate was done using the qvalue R package (Dabney et al., 2010) and loci with q‐values < 0.05 were considered outliers. To determine the effect that different subpopulations had on the outliers detected, pcadapt scans were repeated with sequential exclusion of each of the breeding and wild subpopulations. A pcadapt scan was also repeated using wild subpopulations only, to detect outliers differentiated between the Northern and Southern wild subpopulations.

Results

SNP genotyping and population structure

Of the 64 639 SNPs assayed, 24 306 were informative (MAF > 0.02, unique mapping position in the reference genome, called in at least 90% of individuals), and 2631 had zero missing data across the three breeding populations, the wild E. grandis and other Latoangulatae species. Most of the E. grandis breeding material appeared to be genetically distinct from wild E. grandis subpopulations in the sNMF analysis at K = 3 (Supporting Information Figs S1, S2). Some breeding individuals appeared to group away from the main E. grandis breeding cluster (Fig 1b,c) towards the E. urophylla and GU hybrid clusters in the population structure PCA plot (Fig. S1a) and DAPC analysis at K = 7, suggesting interspecific introgression. In particular, 163 of the 248 ZUL individuals had genomic assignment to E. urophylla and GU hybrids according to sNMF analyses from K = 2 (Fig. S1c). Ancestry mapping confirmed that these individuals had genomic segments assigned to non‐E. grandis ancestry (segments assigned with non‐E. grandis ancestry are visible as nonzero values in Table S1). Of the 1080 individuals from the three breeding programmes, only 685 had no introgressed genomic regions detected by ancestry mapping. To introduce potentially adaptive genetic variation and reduce inbreeding, genetic infusions of wild, unimproved germplasm is common practice in forestry breeding. Since recent infusions can conceal genomic regions that are differentiated between breeding and wild populations in response to artificial selection, the next aim was to identify individuals that appeared to have recently introduced wild ancestry. Joint interrogation of DAPC and sNMF (Fig. S2) analyses (excluding introgressed individuals) indicated a separation between the majority of the breeding germplasm and wild progenitor populations (Fig. 1d) with a set of breeding individuals, predominately from TZA, appearing to share breeding and wild ancestry. Furthermore, of the three breeding populations, TZA appeared to be the least differentiated from the Southern wild subpopulation (F ST = 0.02, Fig. S3). The putatively infused breeding individuals also grouped between the main breeding cluster and the wild progenitor subpopulations in PCA plots (Figs 1c, S1a, S2a). Since genetic infusions aim to introduce adaptive genetic variation into breeding populations, we also wanted to determine the origin of the infused germplasm. We were able to distinguish breeding samples that had wild ancestry derived from provenances in the Southern (light blue shade) vs the Northern (purple shade) subpopulations at K = 3 and K = 4 in the sNMF analysis (Fig. S2b) and confirmed these results by DAPC at K = 3 (Table S2b). Specifically, 98, 24 and two TZA, ZUL and KZN individuals, respectively, grouped with the Southern wild progenitor population cluster, while 16 TZA and ZUL individuals were assigned to the Northern wild subpopulation cluster. At K = 2 in DAPC analysis (excluding introgressed breeding and Mackay individuals), a separation between wild and the main breeding clusters was observed, with the suspected infused individuals either completely or partially assigned to the wild cluster (Fig. 1d; Table S2). A total of 514 individuals (92 from TZA, 49 from ZUL and 373 from KZN) were retained as the core breeding germplasm (referred to as the E. grandis retained or core breeding population) for further analyses. We considered this a single group because there was no observable genetic differentiation among the three breeding populations once infused and introgressed individuals were removed (results not shown). To quantify the extent of genetic differentiation between the core breeding germplasm, the wild progenitor populations and the other species within section Latoangulatae, F ST estimates were calculated for all of these comparisons (shown in Fig. S3). The core breeding population was as differentiated from the wild progenitors as the wild subpopulations were from each other (Fig. S3c). The breeding populations had negative average inbreeding coefficients (; higher observed heterozygosity than expected), which could be explained by novel genetic diversity from intraspecific (interprovenance) hybrids being advanced in the breeding programmes (Table S3). Chloroplast haplotype diversity analysis was conducted to confirm that the breeding populations originate from a wide sampling of the natural populations, as suggested from breeding records. The analysis revealed 15 unique cp haplotypes based on 24 informative SNPs (Fig. S4). Of these, two (H8 and H14) were only detected in one core breeding family, each, and the only cp haplotype present in the Mackay wild subpopulation (H10) was absent from all analysed breeding germplasm. The presence of Northern and Southern wild subpopulation‐derived haplotypes was observed in introgressed, infused and core breeding material.

Genomic regions differentiated between wild and core breeding E. grandis

Genomic regions under artificial selection were expected to be differentiated between the core breeding and wild progenitor populations, leading to changes in SNP marker heterozygosity and LD. The wild population generally had slightly more positive (homozygous) genome‐wide HWE signed R values compared to the breeding population (Fig. S5); however, outlier loci in the breeding population had significantly higher HWE signed R scores (i.e. were more often homozygous) compared to the rest of the SNPs as determined using a one‐tailed Wilcoxon signed‐rank test (P = 6.22e−39, Table S4). High‐confidence differentiated loci were distributed across the genome (Fig. 2a), although it should be noted that peaks of multiple differentiated loci appeared to overlap regions of increased LD in the breeding population on chromosomes 4 and 10 (Figs 2b, S6). Other large regions of increased LD in the core breeding population compared to LD in the wild were observed on chromosomes 2 and 11. Genome‐wide patterns of LD varied noticeably in the breeding population among and within chromosomes (Figs S6, S7), with genome‐wide average decay (R 2 < 0.2) at 1.8 kb.

Fig. 2

Genomic regions differentiated between the core breeding and wild populations. (a) Discriminant analysis of principal components (DAPC) single nucleotide polymorphism (SNP) contributions, indicative of a marker’s informativeness in separating breeding and wild samples into K = 2 clusters (Jombart et al., 2010), marker‐specific F ST values as calculated for breeding (excluding introgressed and infused individuals) vs wild progenitors (Northern and Southern subpopulations), are given for each of the 21 991 SNPs with genomic positions given on the x‐axis. In each panel, the 95th and 99th percentile values (determined for outlier detection excluding SNPs with organellar genome targets) for each of the outlier detection methods are indicated as horizontal lines. Markers identified as differentiated in the 95th and 99th percentile in both analyses are indicated as squares and diamonds, respectively, and markers that had potential organellar genome targets are indicated as asterisks (these are included for illustration purposes only and were not considered for population structure and functional enrichment analysis). The colour scale is based on the Hardy–Weinberg equilibrium (HWE) signed R values of each SNP, indicative of whether a marker is more homozygous (green) or heterozygous (blue) across the breeding and wild populations. The third panel provides pcadapt −log10 q‐values for 21 991 SNPs, detected per chromosome. Outliers correlated with PC1 and PC2 are indicated in turquoise and yellow, respectively. (b) The same DAPC SNP loadings and F ST estimates and pcadapt outliers as shown in (a) for the outlier region on chromosome 4 (position 36 406 226 to 40 449 556). The fourth panel shows HWE signed R values for each marker as calculated in the wild (yellow) and breeding (turquoise) populations to illustrate changes in marker‐specific heterozygosity. Beneath this plot is a physical map of all SNPs and linkage disequilibrium (LD) calculated as the squared correlation (R 2) between alleles at two loci in the wild progenitors and three breeding populations, TZA, ZUL and KZN. [Colour figure can be viewed at wileyonlinelibrary.com] Initial outlier detection in 21 991 SNPs revealed 85 loci that were in the 99th percentile of DAPC SNP contributions and marker‐specific F ST values. Photosynthesis‐related GO terms were enriched among genes within 2 and 6 kb up‐ and downstream of outlier SNPs compared against the full SNP‐captured gene space (Table S5). Detailed interrogation of these outlier SNPs revealed that heterozygous individuals were completely absent from the wild and breeding populations and that the SNP probes had high to complete sequence similarity with the plastid and/or mitochondrial genomes (Table S6). No GO enrichment was found for genes in LD with outliers detected after exclusion of the 53 informative markers that could target the organellar genomes in addition to the nuclear genome. The large, differentiated region on chromosome 4 was also detected using pcadapt. Outliers in this region were correlated with PC1, along which breeding and wild germplasm grouped separately (Fig. S2a). This 4 Mbp genomic region (from position 36 406 226 to 40 449 556) appears to be differentiated in all three breeding subpopulations as it was still detected when any of the three subpopulations were excluded from the pcadapt scans. This region was not detected when the scan was conducted on the wild germplasm only (Fig. S8); therefore, it is not differentiated between the Northern and Southern wild subpopulations. The region contained 310 genes with no significant GO term enrichment. Other large outlier peaks correlated with PC2 were also identified. For example, the large peak on chromosome 2 appeared to be outliers differentiated in ZUL and KZN subpopulations, since this peak was not observed when either of these subpopulations was excluded from the pcadapt scan. Also, the PC2‐correlated peak on chromosome 10 appeared to be differentiated in KZN, specifically, as this peak was not observed when this subpopulation was excluded (Fig. S8).

Discussion

The aim of this study was to investigate the genomic consequences of artificial selection of exotic E. grandis populations that have been cultivated and bred ex situ for over 100 yr, representing a woody perennial in the early stages of domestication. This is, to our knowledge, the first study of plantation forestry domestication looking beyond five generations of selective breeding. Although the SNP markers used in our study were sufficient for population structure and differentiation analyses, denser genome‐wide genotyping, such as that achieved by whole‐genome resequencing, will have to be performed to conclusively detect and discern signatures of selection. Still, our genome‐wide investigation suggests that selection footprints would be discernible at this stage of the domestication process. Domestication studies in other crops and in fruit trees, in particular, suggest that intra‐ and interspecific hybridization have contributed important genetic variation to cultivated populations (He et al., 2011; Myles et al., 2011; Cornille et al., 2012; Wu et al., 2014). Congruent with this, we found evidence of introgression from unintended hybridization, particularly in the ZUL breeding population. Since the 1980s, E. grandis plantations around the world have been challenged by fungal pathogens including Chrysoporthe austroafricana and Coniothyrium cankers (Wingfield et al., 2008). This has led to widespread breeding and deployment of E. grandis × E. urophylla (GU) hybrids, which harnessed disease tolerance from E. urophylla while maintaining the favourable growth characteristics of E. grandis (see Potts & Dungey, 2004, for a comprehensive review of eucalypt hybrid breeding). The ZUL population was specifically bred in a subtropical region where biotic stress caused by these pathogens probably resulted in the selection of E. grandis × GU cryptic hybrids. Consequently, this breeding population is now enriched with introgressed genotypes from E. urophylla. Maintaining pure E. grandis breeding populations may become even harder as more pests and pathogens begin to thrive in subtropical zones, giving cryptic E. grandis‐hybrids an adaptive advantage over pure species genotypes. Individuals that appeared to be hybrids based on PCA, DAPC and sNMF analyses had extensive non‐E. grandis genomic segments detected by ancestry mapping. Some individuals excluded as potentially introgressed had only small genomic segments assigned as non‐E. grandis in origin and grouped within the core E. grandis PCA and DAPC clusters. The small non‐E. grandis genomic segments in these individuals could have originated from interspecific hybridization, or be the result of incomplete lineage sorting. Extensive gene sequence data would be required to differentiate between these possible sources (Joly et al., 2009; Meng & Kubatko, 2009; Yu et al., 2013). Even where whole‐genome sequence data are available, distinguishing between incomplete lineage sorting and hybridization can be problematic in closely related taxa (e.g. Meleshko et al., 2021), and therefore is beyond the scope of this study. Another consideration is that the SNP chip used in the study, being a multispecies array, was enriched for SNP markers shared by two or more related species (Silva‐Junior et al., 2015). Even though SNP allele frequencies can differ very much between species, the preferential inclusion of such shared SNPs may have contributed to background levels of shared polymorphism. However, these are unlikely to account for the large genomic segments identified as non‐E. grandis (i.e. hybrid in origin). Although population differentiation patterns related to potential genetic infusions could also be explained by incomplete lineage sorting, recent genetic infusions of unimproved wild material from Coffs Harbour and Atherton provenances in the Southern and Northern wild subpopulations, respectively, were expected based on breeding records. For example, wild germplasm and unrelated families from other breeding trials were known to have been introduced into the TZA breeding programme, formerly managed by the South African Council for Scientific and Industrial Research (CSIR; Verryn et al., 2009), and germplasm from the Northern wild subpopulation is known to have been introduced into the KZN programme in the 1990s. We observed evidence supporting the presence of genetic infusions (Table S2b; Fig. S2b) in TZA, particularly from the Southern wild subpopulation. The TZA population was selected for solid wood products and bred for temperate climates (Table 1), while ZUL and KZN were, in recent years, mostly bred for pulp‐derived products in subtropical and warm‐ to cool‐temperate climates, respectively. This supports the preferential retention of genotypes originating from the temperate South in TZA, despite breeding records and the cp SNP haplotype network analysis also pointing to recent introductions from Atherton in the Northern wild subpopulation; this is congruent with the notion proposed by Bennett (2011) that one of the first steps in domestication involves capturing existing adaptive genetic variation that matches seed source and ex situ climates. Interspecific hybridization and continued infusions from wild populations are prevalent in domestication; however, since these events occurred very recently in our study populations, the retention of introgressed and infused individuals could confound and mask genomic signatures resulting from artificial selection over a period of 100 yr. Therefore, we excluded potentially introgressed and recently infused genotypes from subsequent analyses, although these genotypes represent important genetic variation for future selective breeding. Cocultivation of genotypes originating from different provenances would have resulted in intraspecific (interprovenance) hybrids in subsequent generations. We saw evidence of this as genome‐wide heterozygosity was higher in the core South African breeding germplasm compared to the wild (Table S3; Figs S5, S6), possibly counteracting genetic bottlenecks that could have occurred at the start of domestication relative to each of the wild source populations. Similarly, Jones et al. (2006) observed increased heterozygosity in first‐generation selections of E. globulus, suggesting that intraspecific hybrids were advanced early in domestication. Still, hybridization among individuals from different provenances alone does not explain the clear genetic differentiation of breeding populations from the wild progenitors (Figs 1, S2). Genetic drift and selection could have contributed to the differentiation observed between the breeding and wild populations. Empirical support to differentiate the contributions of these evolutionary forces could come from analysing older breeding material from over 50 yr ago. Sadly, no such material remains in current breeding archives. Assessing the effect and impact of genetic drift might also be difficult considering the repeated introductions of genotypes from diverse, wild populations as is reported to have occurred since the 1960s (Poynton, 1979). When looking at genomic changes over 10 generations of adaptive domestication in maize, Wisser et al. (2019) described two phases: early fixation of a small number of large‐effect variants followed by gradual allele frequency changes at many loci due to selection of quantitative traits. The latter phase could explain shifts in allele frequencies that resulted in the genome‐wide genetic differentiation between breeding and wild E. grandis material, which probably represent the genomic changes underlying rapid genetic gains achieved early on for highly complex traits (Verryn, 2002; Verryn et al., 2009). Even when selection pressure is high and large genetic gains are observed phenotypically, selection on complex traits typically does not translate into classic selection signatures, known as hard and soft sweeps (Cutter & Payseur, 2013). Selection sweeps arise in the genome when a novel mutation (Smith & Haigh, 1974) or standing genetic variation (Innan & Kim, 2004; Hermisson & Pennings, 2005) confers a strong selective advantage and becomes fixed in a population (Pritchard et al., 2010). They appear as stretches of elevated homozygosity, increased differentiation (e.g. higher localized F ST estimates) and increased LD (Cutter & Payseur, 2013), since genetic variants surrounding the locus under selection also become fixed due to genetic hitchhiking (Smith & Haigh, 1974). We uncovered one such region on chromosome 4, with several SNPs differentiated between breeding and wild populations, and elevated LD in all three breeding programmes (Figs 2, S6, S8). We postulate that this region contains variants that were under either negative or neutral selection in the wild but were preferentially (positively) advanced in South Africa, thereby reducing the genetic variation and increasing LD surrounding the selected locus in breeding populations; that is, this may represent an early soft sweep. Because domestication had occurred for approximately five generations of formal breeding preceded by up to as many generations of informal selections, allowing only a limited number of recombination events, this genomic region remains large, limiting our ability to identify candidate genes and biological processes. Enrichment for photosynthesis‐related GO terms observed in an initial screen of genes in LD with outlier SNPs suggested that several of the EUChip60K SNP probe sets must have additional target sequences in the plastid and/or mitochondrial genomes (Table S6). Gene transfers among different genomes in a cell is well documented for E. grandis (Pinard et al., 2019b). Even though these SNP probes could detect nuclear and organellar sequences, for 30 SNPs, which were polymorphic in the breeding and wild populations, we observed a complete deficiency of heterozygotes in the wild and breeding populations (Table S7). It is likely that for these SNPs, the genotypes were dominated by organellar genome template, which is in vast excess in genomic DNA samples. It is possible that SNPs targeting organellar and nuclear genome sequences were detected as outliers as they would reflect founder effects if only some provenances were introduced to South Africa. The source provenances that constituted the original seed imports from the first half of the 20th century remain unknown. Since imports and subsequent exchange of genetic material occurred mostly via seed, maternally inherited cp SNPs were used to inform which provenances were introduced to South Africa. The cp haplotype network (Fig. S4) supported that some wild haplotypes (H4, H10, H12 and H15) were not detected in breeding populations while two haplotypes present in the breeding germplasm were not present in the wild material, possibly representing unsampled wild provenances. Although we excluded all of these putative cp‐targeting SNPs from further analyses, we chose to include their outlier detection values in Figs 2 and S6 for illustration purposes. To conclude, by interrogating genome‐wide SNP allele frequencies in E. grandis breeding and wild populations, we have uncovered genomic evidence of evolutionary processes similar to those that have shaped the genomes of other domesticates. In addition to the genome‐wide genetic differentiation between breeding and wild populations, probably caused by early artificial selection of polygenic traits, we observed localized allele frequency shifts with increased differentiation and LD. A lack of recombination events required to uncouple loci under selection from the neutral genomic background meant that these regions were still too broad for candidate gene identification. Although we used SNPs to tag genomic regions under artificial selection, we know from published reports that the causative variants could have been single nucleotide, presence/absence and copy number variants, as well as other structural variants (see review by Olsen & Wendel, 2013). Additionally, the use of SNP arrays results in the exclusion of rare variants that may be more informative in terms of recent differentiation events (Dokan et al., 2021). Therefore, our future aim is to use sequenced‐based genotyping to elucidate structural variants and haplotypes in E. grandis breeding and wild progenitor populations that may be associated with adaptation to ex situ environments and early domestication.

Author contributions

MMM‐O and AAM developed the idea. MMM‐O, SMR, MMM, GvdB, SDV, JJA, JOB and AAM contributed to the design of the study. MMM‐O, SMR, MMM, GvdB, SDV and AAM performed sample collection and generated the data. HT analysed chloroplast marker data and generated chloroplast haplotype networks. MMM‐O analysed the data and wrote the first draft of the manuscript. All authors read, edited and approved the final manuscript. Fig. S1 Population structure in relation to wild Eucalyptus grandis and other species in section Latoangulatae based on principal component analysis, discriminant analysis of principal components and sparse nonnegative matrix factorization. Fig. S2 Breeding Eucalyptus grandis population structure for all breeding samples, those excluding introgressed, and those excluding infused individuals in relation to the wild progenitor populations based on principal component analysis, sparse nonnegative matrix factorization and discriminant analysis of principal components analyses. Fig. S3 Population differentiation F ST estimates among breeding Eucalyptus grandis, wild E. grandis and other species in section Latoangulatae. Fig. S4 Chloroplast (cp) haplotype network based on 24 cp single nucleotide polymorphisms. Fig. S5 Marker‐specific Hardy–Weinberg equilibrium signed R values of wild vs breeding populations. Fig. S6 Genomic outliers and linkage disequilibrium plots per chromosome. Fig. S7 Breeding population linkage disequilibrium decay over genomic distance in kb. Fig. S8 Outlier detection by pcadapt scan. Click here for additional data file. Table S1 Ancestry assignment of chromosomal segments. Click here for additional data file. Table S2 Cluster assignment of samples using discriminant analysis of principal components to identify genetically infused breeding individuals. Table S3 Summary statistics of genetic diversity using hierfstat v.0.04‐22. Table S4 Wilcoxon signed rank test P‐values supporting the alternative hypothesis that the mean of the outliers was greater than the mean of the rest of the single nucleotide polymorphisms. Table S5 Gene Ontology enrichment analysis for genes in linkage disequilibrium with outlier single nucleotide polymorphisms (SNPs) before excluding organellar‐targeting SNPs. Click here for additional data file. Table S6 Blastn against the organellar genomes. Click here for additional data file. Table S7 Marker statistics of single nucleotide polymorphisms with multigenome targets. Please note: Wiley Blackwell are not responsible for the content or functionality of any Supporting Information supplied by the authors. Any queries (other than missing material) should be directed to the New Phytologist Central Office. Click here for additional data file.

56 in total

1. Basic local alignment search tool.

Authors: S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal: J Mol Biol Date: 1990-10-05 Impact factor: 5.469

2. Plant domestication, a unique opportunity to identify the genetic basis of adaptation.

Authors: Jeffrey Ross-Ibarra; Peter L Morrell; Brandon S Gaut
Journal: Proc Natl Acad Sci U S A Date: 2007-05-09 Impact factor: 11.205

3. adegenet: a R package for the multivariate analysis of genetic markers.

Authors: Thibaut Jombart
Journal: Bioinformatics Date: 2008-04-08 Impact factor: 6.937

4. Genetic mapping of QTLs controlling vegetative propagation in Eucalyptus grandis and E. urophylla using a pseudo-testcross strategy and RAPD markers.

Authors: D Grattapaglia; F L Bertolucci; R R Sederoff
Journal: Theor Appl Genet Date: 1995-06 Impact factor: 5.699

5. MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms.

Authors: Sudhir Kumar; Glen Stecher; Michael Li; Christina Knyaz; Koichiro Tamura
Journal: Mol Biol Evol Date: 2018-06-01 Impact factor: 16.240

6. Genome-wide patterns of recombination, linkage disequilibrium and nucleotide diversity from pooled resequencing and single nucleotide polymorphism genotyping unlock the evolutionary history of Eucalyptus grandis.

Authors: Orzenil B Silva-Junior; Dario Grattapaglia
Journal: New Phytol Date: 2015-06-16 Impact factor: 10.151

Review 7. The domestication and evolutionary ecology of apples.

Authors: Amandine Cornille; Tatiana Giraud; Marinus J M Smulders; Isabel Roldán-Ruiz; Pierre Gladieux
Journal: Trends Genet Date: 2013-11-27 Impact factor: 11.639

8. A Genome-Wide Association Study for Resistance to the Insect Pest Leptocybe invasa in Eucalyptus grandis Reveals Genomic Regions and Positional Candidate Defense Genes.

Authors: Lorraine Mhoswa; Marja M O'Neill; Makobatjatji M Mphahlele; Caryn N Oates; Kitt G Payn; Bernard Slippers; Alexander A Myburg; Sanushka Naidoo
Journal: Plant Cell Physiol Date: 2020-07-01 Impact factor: 4.927

9. Sequencing of diverse mandarin, pummelo and orange genomes reveals complex history of admixture during citrus domestication.

Authors: G Albert Wu; Simon Prochnik; Jerry Jenkins; Jerome Salse; Uffe Hellsten; Florent Murat; Xavier Perrier; Manuel Ruiz; Simone Scalabrin; Javier Terol; Marco Aurélio Takita; Karine Labadie; Julie Poulain; Arnaud Couloux; Kamel Jabbari; Federica Cattonaro; Cristian Del Fabbro; Sara Pinosio; Andrea Zuccolo; Jarrod Chapman; Jane Grimwood; Francisco R Tadeo; Leandro H Estornell; Juan V Muñoz-Sanz; Victoria Ibanez; Amparo Herrero-Ortega; Pablo Aleza; Julián Pérez-Pérez; Daniel Ramón; Dominique Brunel; François Luro; Chunxian Chen; William G Farmerie; Brian Desany; Chinnappa Kodira; Mohammed Mohiuddin; Tim Harkins; Karin Fredrikson; Paul Burns; Alexandre Lomsadze; Mark Borodovsky; Giuseppe Reforgiato; Juliana Freitas-Astúa; Francis Quetier; Luis Navarro; Mikeal Roose; Patrick Wincker; Jeremy Schmutz; Michele Morgante; Marcos Antonio Machado; Manuel Talon; Olivier Jaillon; Patrick Ollitrault; Frederick Gmitter; Daniel Rokhsar
Journal: Nat Biotechnol Date: 2014-06-08 Impact factor: 54.908

10. Exome Resequencing Reveals Evolutionary History, Genomic Diversity, and Targets of Selection in the Conifers Pinus taeda and Pinus elliottii.

Authors: Juan J Acosta; Annette M Fahrenkrog; Leandro G Neves; Márcio F R Resende; Christopher Dervinis; John M Davis; Jason A Holliday; Matias Kirst
Journal: Genome Biol Evol Date: 2019-02-01 Impact factor: 3.416