Literature DB >> 27748400

Genome scans for divergent selection in natural populations of the widespread hardwood species Eucalyptus grandis (Myrtaceae) using microsatellites.

Zhijiao Song1,2,3, Miaomiao Zhang2,4, Fagen Li2, Qijie Weng2, Chanpin Zhou2, Mei Li2, Jie Li2, Huanhua Huang5, Xiaoyong Mo4, Siming Gan1,2.   

Abstract

Identification of loci or genes under natural selection is important for both understanding the genetic basis of local adaptation and practical applications, and genome scans provide a powerful means for such identification purposes. In this study, genome-wide simple sequence repeats markers (SSRs) were used to scan for molecular footprints of divergent selection in Eucalyptus grandis, a hardwood species occurring widely in costal areas from 32° S to 16° S in Australia. High population diversity levels and weak population structure were detected with putatively neutral genomic SSRs. Using three FST outlier detection methods, a total of 58 outlying SSRs were collectively identified as loci under divergent selection against three non-correlated climatic variables, namely, mean annual temperature, isothermality and annual precipitation. Using a spatial analysis method, nine significant associations were revealed between FST outlier allele frequencies and climatic variables, involving seven alleles from five SSR loci. Of the five significant SSRs, two (EUCeSSR1044 and Embra394) contained alleles of putative genes with known functional importance for response to climatic factors. Our study presents critical information on the population diversity and structure of the important woody species E. grandis and provides insight into the adaptive responses of perennial trees to climatic variations.

Entities:  

Mesh:

Year:  2016        PMID: 27748400      PMCID: PMC5066178          DOI: 10.1038/srep34941

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Natural populations may survive climate change through migration and local adaptation12. In sessile plants, while migration involves propagule (i.e. seed and plant fragment) dispersal and population establishment in new locations2, local adaptation represents higher fitness of local than nonlocal populations resulting from divergent selection among environments3. As adaptation is often inherent in and interplayed with migration2, migration can not be viewed simply as an alternative to occurrence of local adaptation4. With the occurrence of rapid climate change, the migration rates observed commonly (e.g. 20–40 km per century) are far below the rates projected (e.g. 300–500 km per century) to track future climate shifts1, and local adaptation to future climatic conditions will likely be necessary for long-term in situ population persistence5. Divergent natural selection, as exerted usually by environmental gradients, is the major driving force of local adaptation. Such selection can result in gene frequency changes among populations within a species, ultimately leading to variation in both phenotypic traits and genetic structure even in the presence of gene flow67. Identifying the effects of loci or genes under natural selection is important for elucidating the genetic basis of adaptation to different environments and also for practical applications in biodiversity conservation and selective breeding89. Using a candidate gene or genome scan approach, genomic loci associated with natural selection could be identified through linkage disequilibrium (LD) and quantitative trait locus mapping of adaptive traits, detection of between-population differentiation (FST) outlier loci and analysis of alleles that are correlated with environmental variables810. Genome scans provide a powerful means in population genetics studies to detect outlier loci putatively under selection1112 and identify alleles associated with environmental variables13. In contrast to a candidate gene approach, in which a limited set of candidate genes potentially influencing the adaptive trait are used, genome scans investigate a large number of loci throughout the genome even in the absence of phenotypic information14. To date, genome scans have been extensively employed for detecting loci or genes under selection in model organisms (e.g. Zea mays L.15), but its potential still remains to be explored for most non-model species1016. Moreover, while most genome scan studies on natural selection have used high-throughput array-based single nucleotide polymorphism markers, in which the finite number of markers (at least in non-model species) and the marker nature of bi-allelism can limit the genotyping of all variations at a genomic region, few studies have attempted to use multi-allelic markers, such as microsatellites (or simple sequence repeats, SSR). The broadleaved tree genus Eucalyptus L’Hér. (family Myrtaceae) encompasses more than 780 species and subspecies native to the Australian continent and surrounding islands17. It is highly diverse and displays significant adaptability and phenotypic plasticity18. With the projection of pronounced temperature warming and rainfall decline in Australia, eucalypts are particularly interesting for climate change studies, considering their poor dispersal capability and limited gene flow19. Also, Eucalyptus species and hybrids constitute the most widely planted hardwood trees in the world (plantations >20 million ha, http://git-forestry.com/Global_Eucalyptus_Map.htm) and are therefore important for the global forest-related industries. Within Eucalyptus, E. grandis Hill ex Maiden (flooded gum or rose gum) is one of the most important species in terms of breeding and genomics efforts18 and has been selected as the second tree species (only after Populus trichocarpa Torr. & Gray) for whole genome sequencing20. E. grandis is a diploid species with 2n = 22 chromosomes and 640 Mb genome size20. It is distributed naturally mainly in coastal areas from northern New South Wales (32° S in latitude) to southern Queensland (26° S) in Australia, with two outlier patches in central (22° S) and northern (16−18° S) Queensland. Its natural forests are mostly on low lands and hills ranging from sea level to ~600 m in altitude21. Of note, E. grandis displays significant variation in adaptive phenotypic traits, such as growth and frost tolerance (e.g. Rockwood & Meskimen22). A relatively low level of population differentiation has been revealed in E. grandis by isozyme markers (GST = 0.12)23, suggesting a weak population structure. Thus far, genomic locus-association analyses have detected loci with significant effects on economic traits in E. grandis and other Eucalyptus species18. However, there have been no association studies specifically aiming at local adaptation in E. grandis. In addition, even in Eucalyptus, only two studies have recently reported candidate loci for diversifying selection in natural populations of E. gomphocephala DC.24 and E. tricarpa (L.A.S. Johnson) L.A.S. Johnson & K.D. Hill25. In this study, we used 110 SSR markers distributed across the genome sequence of E. grandis, including 45 genomic SSR (gSSR) and 65 expressed sequence tag (EST) derived SSR (EST-SSR) markers, to search for molecular footprints of divergent selection in E. grandis natural populations. The genetic diversity and population structure at putatively neutral gSSR loci were first assessed for 16 E. grandis populations across the species range. The FST outlier SSRs were then tested against the 110 SSRs to identify loci potentially involved in divergent natural selection. SSR alleles associated with climatic variables were finally investigated to reveal gene polymorphisms contributing to local adaptation to temperature, isothermality and/or precipitation.

Materials and Methods

Population sampling and DNA extraction

A total of 159 individuals were sampled from 16 provenances across the range of E. grandis (Fig. 1 and Table 1), each provenance representing one population and each individual originating from an open-pollinated mother tree in the native forests. The sample size ranged from nine to twelve individuals per population (Table 1). The mother trees were at least 100 m apart to minimise the risk of collecting seed from closely related trees (R Arnold, Australian Tree Seed Centre, Australia, pers. comm.). The original provenance seedlots had been provided by Australian Tree Seed Centre (Canberra, Australia). The leaf samples were stored at −80 °C prior to DNA extraction. Leaf samples were collected in July 2011 from a 2-year-old E. grandis provenance-family trial located at Zhaoqing City (112°27′ E, 23°03′ N), Guangdong Province, China.
Figure 1

Geographic distribution of the 16 Eucalyptus grandis populations studied.

The map was generated using software ArcGIS 10.0 (http://www.esri.com/software/arcgis/). Full description of the populations can be found in Table 1. SF, state forest; ACT, Australian Capital Territory.

Table 1

Eucalyptus grandis populations, their origins and sample size as well as the mean values of three non-correlated climatic variables (during 1950–2000).

No.CodePopulationLatitude (S)Longitude (E)Altitude (m)NMAT (°C)MAT partitionITIT partitionAP (mm)AP partition
1PicPiccaninny CK Windsor, QLD16°13′144°58′1,160919.7560.121,3523
2CopCopperlode, QLD16°58′145°40′425921.7456.922,1952
3MSMT Spec SF Paluma, QLD18°56′146°07′850920.4354.141,0014
4FinFinch Hatton Gorge, QLD21°04′148°37′2001022.1451.111,3363
5CreCredition SF, QLD21°13′148°28′7201119.8551.619834
6KinKin Kin, QLD26°12′153°10′40920.8350.631,5171
7BelBelli, QLD26°29′152°50′100919.9551.711,4981
8BorBorumba Range, QLD26°35′152°36′5001118.4151.711,3193
9ConConnondale, QLD26°40′152°36′5601017.5251.511,3843
10KilKilcop Creek, QLD26°45′152°35′4001018.0251.811,2763
11MMMT Mee, QLD27°08′152°43′2001219.0151.911,1783
12MTMT Tamborine, QLD27°55′153°11′5001217.7252.011,5221
13MLMT Lindsay, QLD28°21′152°45′3401017.2651.811,2623
14BagBagawa, NSW30°07′152°54′4401017.0650.031,8412
15OraOrara West SF, NSW30°20′153°00′293917.5250.331,9512
16BulBulahdelah SF, NSW32°20′152°15′20917.7250.231,3073
Total     159 6 4 4

Using a k-means analysis42 on each of the non-correlated climatic variables, climatic partitioning assigned the populations to four or six groups. QLD, Queensland; MT, mountain; SF, state forest; NSW, New South Wales; N, number of individuals; MAT, mean annual temperature; IT, isothermality; AP, annual precipitation.

Leaf tissues (~300 mg) were powder homogenized in a MM400 mixer mill (Retsch GmbH, Haan, Germany), and genomic DNA was subsequently extracted using a modified CTAB method26. DNA concentration and quality were determined by agarose gel electrophoresis and a Nanodrop 2000 spectrophotometer (Thermo Fisher Scientific Inc., Waltham, MA, USA).

Microsatellite markers and their genotyping

In total, 110 SSR markers distributed across the 11 main scaffolds and one small scaffold (Supplementary Table S1) were used in this study, including 45 gSSRs2728 and 65 EST-SSRs293031. Polymerase chain reactions (PCRs) were performed in a 10 μL volume on a DNA Engine (Bio-Rad Laboratories, Hercules, CA, USA) following a florescent-dUTP based protocol32, with MgCl2 concentration and melting temperature (Tm) depending on marker. The PCR products (1 μL) were diluted 1:10.5 with loading buffer (9.34 μL deionized formamide and 0.16 μL GeneScan 500LIZ) and then detected on an ABI 3130xl genetic analyzer using GeneMapper 4.1 software (Applied Biosystems, Foster City, CA, USA). Sixteen individuals, each randomly selected from one population, were PCR repeated to test the marker reproducibility.

Marker polymorphism, population diversity and structure

For each locus, number of alleles (NA), observed heterozygosity (HO), expected heterozygosity (HE) and fixation index (F) were estimated with GENALEX 6.4.133. Polymorphic information content (PIC) was computed with PowerMarker 3.2534. Null allele frequencies (NAFs) were assessed following 50,000 bootstrap resamples in FREENA35. Hardy-Weinberg equilibrium (HWE, with manual Bonferroni correction for P values) was tested over all populations with GENEPOP 4.236. The software FSTAT 2.9.3.2 (http://www2.unil.ch/popgen/softwares/fstat.htm) was used to investigate per-locus inbreeding coefficients of individuals relative to the total population (FIT) and to the sub-population (FIS), FST and allelic richness (AR, based on the minimal sample size of nine individuals) as well as between-locus LD (with Bonferroni correction). Only the genomic SSRs that neither departed significantly from HWE (P < 0.01) nor showed FST outlying values in lositan analysis on 16 populations (implemented as stated below) were included (31 markers, see Results) in subsequent population diversity and structure analyses. For each population, average number of alleles per locus (ANA), HO, HE, F and AR were estimated similarly as above. Significance in diversity differences between populations or population groups was tested using the ‘Compare among groups of samples’ function with ‘two sided’ option and 1,000 permutations in fstat 2.9.3.2 (http://www2.unil.ch/popgen/softwares/fstat.htm). Nei’s genetic distance between populations was calculated for construction of an unweighted pair group method with arithmetic mean (UPGMA) dendrogram in POWERMARKER 3.2534. The package GENALEX 6.4.133 was used under default settings to conduct analyses of principal coordinates (PCoA), overall between-population differentiations (FST) and molecular variances (AMOVA). The Bayesian clustering program STRUCTURE 2.3.437 was performed to assign individuals to a number (K) of genetically homogeneous clusters assuming an admixture model and correlated allele frequencies between populations. For each of K values (1−16), the Markov Chain Monte Carlo (MCMC) sampling was replicated with 10 iterations38 each following 100,000 burn-ins and 100,000 MCMC repetitions. The optimal K value was determined by the highest ΔK method39 in STRUCTURE HARVESTER 0.640 and by a complementary assessment of the change in lnP(K) slope with increase in each K value.

Climatic data and partitioning of populations

Values of 19 climatic variables during the years 1950–2000 were obtained for each original population location from a standard set of climate grids (http://www.worldclim.org/) at a spatial resolution of 30 arc-seconds41. Correlations between the climatic variables were computed using R function COR.TEST, and only three non-correlated variables (mean annual temperature, isothermality and annual precipitation, see Results) were included in subsequent association analysis of candidate loci. The 16 populations were partitioned into homogeneous groups using a k-means analysis on each of the non-correlated climatic variables42, as implemented similarly in black spruce [Picea mariana (Mill.) B.S.P.]43.

Detection of F ST outlier loci

Three FST outlier detection methods were used to test all the 110 SSR loci for evidence of natural selection among the population groups of each climatic partition. The first was a summary-statistics method44 implemented in LOSITAN45. The outlying values of FST were identified from a plot of FST vs. expected heterozygosity. Following the infinite allele mutation model, the distribution of FST vs. expected heterozygosity was calculated in LOSITAN with 100,000 simulation replicates under the option of neutral mean FST. Markers outside the 95% and 5% confidence intervals of the distribution were considered as candidates (outliers) for positive (divergent) and balancing selections, respectively. False discovery rate (FDR) was set to 0.01. The second was also a summary-statistics method44 modified for hierarchically structured populations46 as implemented in ARLEQUIN 3.547. As the hierarchical structure was considered, coalescent simulations (500,000) were implemented under a hierarchical island model to obtain more realistic null distribution for FST statistics and therefore avoid possible false positives. Significant loci at the 95% or 5% confidence level were recognized as candidates for positive (divergent) selection or balancing selection, respectively. The third was a hierarchical Bayesian modeling method modified from Beaumont and Balding48 and implemented in BAYESCAN 2.149. Three runs were performed in BAYESCAN with standard parameters, except 100 prior odds for the neutral model. FDR was set to 0.01. A Bayesian factor and its logarithm value were generated for each marker and thereafter classified into five grades according to Jeffrey’s scale of evidence of selection (http://cmpg.unibe.ch/software/bayescan/index.html), namely, barely worth mentioning, substantial, strong, very strong and decisive evidences, in which the last three grades were considered as candidates for divergent selection.

Detection of candidate SSR loci associated with environmental variables

The FST outlier loci detected above were corroborated using a spatial analysis method (SAM)13. The associations between allele frequencies and climatic variables were tested using the likelihood ratio (G) and Wald tests13. With the Bonferroni correction, the significance was declared at 99.99% confidence level, and a conservative alpha of 0.01 was adopted in the Wald test to reduce false positives. Moreover, significant associations were further evaluated in OFFICE EXCEL 2010 (Microsoft Corp., Redmond, WA, USA) to correct allele frequency autocorrelation using univariate linear regressions between group allele frequencies and environmental variables.

Results

In total, 1,857 alleles were identified at the 110 polymorphic SSR loci. Single-locus parameters exhibited striking differences among loci (Supplementary Table S2), with NA ranging from three to 52 (mean 16.9), HO from 0.209 to 1.000 (mean 0.623), HE from 0.200 to 0.890 (mean 0.706), AR from 1.434 to 3.835 (mean 3.027) and NAF from zero to 0.278 (mean 0.075). No pair of loci showed significant LD in two or more populations (P < 0.05), suggesting the independent segregation of the marker loci used. Twenty-eight loci deviated significantly from HWE (P < 0.01), including 12 gSSRs and 16 EST-SSRs (Supplementary Table S2). The 12 non-HWE gSSRs, along with gSSR loci (eight, including six non-HWE loci; Supplementary Fig. S1 and Table S2) that exhibited FST outlying values for the 16 populations, were excluded from subsequent population diversity and structure analyses. High levels of putatively neutral (gSSR) diversity were revealed in E. grandis populations (Table 2), e.g. HE ranging from 0.706 to 0.809 (mean 0.774) and AR from 4.295 to 5.300 (mean 4.929). Specifically, the five northern populations (codes Pic, Cop, MS, Fin and Cre, mean HO = 0.711) were significantly less diverse as compared to the remaining (southern) populations (mean HO = 0.758, P = 0.007). The fixation index (F) per population suggested little evidence of inbreeding, with a range between zero and 0.127 (mean 0.037). The overall FST values were generally low as demonstrated by between-population comparisons (mean FST = 0.037; Supplementary Table S3) and AMOVA analysis (among-population variation percentage being 3.7%, P < 0.001; Supplementary Table S4), indicating a weak population structure.
Table 2

Genetic diversity parameters of E. grandis populations based on 31 putatively neutral gSSR loci.

No.Pop. code[]ANANPAHOHEFAR
1Pic8.48440.7080.7530.0494.624
2Cop8.16180.7010.7160.0094.295
3MS7.90380.7170.7480.0404.557
4Fin8.90370.6920.7060.0054.459
5Cre9.90340.7380.736–0.002[]4.693
6Kin8.61370.7610.8010.0505.280
7Bel8.32330.7600.7910.0455.256
8Bor10.54860.7520.8010.0555.235
9Con8.48460.7690.8030.0395.199
10Kil9.03260.7520.7730.0295.000
11MM11.516100.7090.8090.1275.144
12MT11.35580.7690.8060.0565.077
13ML9.06580.7680.7750.0044.918
14Bag9.516130.7420.7920.0654.953
15Ora8.290110.7880.8040.0225.300
16Bul8.258120.7670.7700.0004.870
Mean (SE) 9.147 (1.129)7.560 (2.851)0.743 (0.009)0.774 (0.005)0.037 (0.011)4.929 (0.318)

†See Table 1 for full description of the populations. ‡The negative F value was treated as zero. ANA, average number of alleles per locus; NPA, number of private alleles; HO, observed heterozygosity; HE, expected heterozygosity; F, fixation index; AR, allelic richness.

Similar patterns of neutral population structure were observed in PCoA analysis, UPGMA dendrogram and STRUCTURE analysis (Fig. 2). For PCoA analysis (Fig. 2a), the first coordinate accounted for 33.2% of the variation and separated clearly the northern and southern populations. Further, the UPGMA dendrogram (Fig. 2b) based on Nei’s genetic distance confirmed the clustering of northern vs. southern populations, coinciding well with geographic distribution. Furthermore, the STRUCTURE analysis (Fig. 2c) illustrated the differentiation between the northern and southern populations, providing evidence for the presence of two genetically distinct clusters (i.e. K = 2).
Figure 2

Genetic structure of 16 Eucalyptus grandis populations based on 31 putatively neutral gSSR loci.

Full description of the populations can be found in Table 1. (a) Principal coordinates analysis (PCoA). (b) Unweighted pair group method with arithmetic mean (UPGMA) dendrogram. (c) Individual proportion and population membership to each of the clusters inferred in STRUCTURE analysis (K = 2).

Of the 19 climatic variables obtained (http://www.worldclim.org/), only three were not correlated with each other, namely, mean annual temperature, isothermality and annual precipitation. The k-means analysis partitioned optimally the 16 populations into six, four and four groups in mean annual temperature, isothermality and annual precipitation, respectively (Table 1). Each group comprised two or more populations, with the only exception of one population in group 4 for isothermality partition.

F ST outlier loci

A total of 58 FST outlier loci were identified collectively using LOSITAN, ARLEQUIN and BAYESCAN against the three non-correlated climatic factors (Table 3), including 35 (31.8%, 24 positive and 11 balancing selections; Table 3), 22 (20.0%, 10 positive and 12 balancing selections; Table 3) and 20 (18.2%, 13 positive and seven balancing selections; Table 3) outliers for mean annual temperature, isothermality and annual precipitation, respectively. Four outliers were detected in all three climatic partitions, namely, EUCeSSR0755, EUCeSSR1070, Embra394 and EUCeSSR0893. In addition, the rates of EST-SSR outliers (20.0–33.8%) for all the climatic partitions were significantly greater (P = 0.006) than those of genomic SSRs (15.6–28.9%; Table 3).
Table 3

FST outliers detected for the three un-correlated climatic factors in LOSITAN45, ARLEQUIN47 and BAYESCAN49.

SSR locusMAT
Isothermality
Annual precipitation
L[]A[]B[§]L[]A[]B[§]L[]A[]B[§]
EUCeSSR1061        
Embra180**  **    
EUCeSSR347       
Embra98        
EUCeSSR0502       
EUCeSSR0224        
EUCeSSR0276        
EUCeSSR0979      **  
Embra227        
Embra280**     **  
EUCeSSR313        
EUCeSSR384      
EUCeSSR0599        
Embra130      **  
EUCeSSR0035        
EUCeSSR151       
EUCeSSR686       
Embra242       
Embra120*        
EUCeSSR626        
EUCeSSR0455      *** 
Embra358*        
Embra304***        
EUCeSSR0103*        
Embra188***       
EUCeSSR0906       
Embra37        
Embra187**        
EUCeSSR0755****  * ***** 
Embra233*        
EUCeSSR231        
EUCeSSR0620   *     
EUCeSSR880*        
EUCeSSR479        
EUCeSSR1042***    *** 
Embra83        
Embra369*        
EUCeSSR1087       
EUCgSSR21        
EUCeSSR522*        
Embra197        
EUCeSSR1070***  *** *  
Embra88**** **    
Embra53*  **    
EUCeSSR0497*        
EUCeSSR0845        
Embra217      *  
EUCeSSR0679*        
EUCeSSR1044      *  
EUCeSSR0568        
Embra394****** ***** ***** 
EUCeSSR0893***********************
Embra269      *  
EUCeSSR292***** ***    
EUCeSSR349      *  
EUCeSSR209*        
EUCeSSR0849*** **    
EUCeSSR1145        
Sub-total28 (24 + 4)18 (8 + 10)120 (9 + 11)12 (9 + 3)115 (13 + 2)10 (5 + 5)1
L ∩ A11 (10.0%; 8 + 3)[]10 (9.1%; 8 + 2)[]5 (4.5%; 5 + 0)[]
L ∩ A ∩ B1 (0.9%)1 (0.9%)1 (0.9%)
L ∪ A ∪ B35 (31.8%; 24 + 11)[]22 (20.0%; 10 + 12)[]20 (18.2%; 13 + 7)[]
gSSRs13 (28.9%)††8 (17.8%)††7 (15.6%)††
EST-SSRs22 (33.8%)††14 (21.5%)††13 (20.0%)††
Total58 (52.7%; 32 + 26)[]  

†Confidence levels of the distribution relative to neutral selection: *, above the 95% level; **, above the 99% level; ***, above the 99.9% level; , below the 5% level; , below the 1% level; , below the 0.1% level. ‡Significance levels at the 95% confidence: *, 0.01 < P ≤ 0.05; **, 0.001 < P ≤ 0.01; ***, P ≤ 0.001. Significance levels at the 5% confidence: , 0.01 < P ≤ 0.05; , 0.001 < P ≤ 0.01. §Grades of evidence of selection: **, very strong evidence; ***, decisive evidence. ¶Percentage of the total number of markers used (110). ††Percentage of the total number of genomic SSR markers (45) or EST-SSR markers (65) used. The underlined asterisks and numbers are for balancing selection loci. MAT, mean annual temperature; L, LOSITAN; A, ARLEQUIN; B, BAYESCAN; L ∩ A, LOSITAN and ARLEQUIN; L ∩ A ∩ B, LOSITAN, ARLEQUIN and BAYESCAN; L ∪ A ∪ B, LOSITAN, ARLEQUIN or BAYESCAN.

LOSITAN, ARLEQUIN and BAYESCAN detected consistently the largest, medium and least numbers, respectively, of FST outlying loci for all the three climatic partitions (Table 3 and Supplementary Fig. S2). There were 11 (10.0%), 10 (9.1%) and five (4.5%, Table 3) outliers identified simultaneously in LOSITAN and ARLEQUIN analyses for mean annual temperature, isothermality and annual precipitation, respectively, while only one outlier locus EUCeSSR0893 (0.9%; Table 3) identified in both LOSITAN and ARLEQUIN was detected with BAYESCAN.

Loci and alleles associated with climatic variables

A total of nine significant associations were detected between FST outlier allele frequencies and climatic variables, involving seven alleles from five loci (Table 4). Regression analysis indicated that three of the associations were significantly negatively linear (Fig. 3) while the remaining six were non-linear (Supplementary Fig. S3). Five, two and two alleles were significantly associated with mean annual temperature, isothermality and annual precipitation, respectively. All alleles exhibited significant association with only one climatic factor, with the exceptional allele EUCeSSR0755-276 bp associated with all the three climatic factors. In addition, two alleles (266 and 276 bp) of the same locus EUCeSSR0755 were significantly associated with the same climatic factor, mean annual temperature.
Table 4

Loci and alleles significantly associated with the three non-correlated climatic variables in E. grandis.

Locus-allele (bp)Scaffold[]MATIsothermalityMAPPutative function at E ≤ 10−5 (Organism; BlastX E value)
Embra180-1201+  No significant match
EUCeSSR0755-2666+  No significant match
EUCeSSR0755-2766+++ 
EUCeSSR1044-40610  +Zinc finger, C3HC4 type (RING finger) protein (Medicago truncatula; 5E-69)
Embra394-21510+  Predicted: thionin-like protein 2 (Eucalyptus grandis; 7E-34)
EUCeSSR0849-21211 + Predicted: uncharacterized protein LOC104426570 (Eucalyptus grandis; 2E-26)
EUCeSSR0849-23211+   

†Scaffolds as aligned to E. grandis genome sequence (version 1.1, http://www.phytozome.net/eucalyptus.php). A plus sign (+) indicates the significant association detected by both G and Wald tests. MAT, mean annual temperature; MAP, mean annual precipitation.

Figure 3

Linear regression for three significant associations between FST outlier allele frequencies and climatic variables.

Each dot represents a group of homogeneous populations in K-means climatic partition. (a) The 120 bp allele of locus Embra180 associated with mean annual temperature. (b,c) The 276 bp allele of EUCeSSR0755 associated with mean annual temperature and isothermality, respectively.

Discussion

High genetic diversity and low population structure in E. grandis

Despite the large geographical (latitudinal) range, E. grandis populations show high levels of neutral genetic diversity (mean HE = 0.774; Table 2), consistent with SSR-derived population diversity in other eucalypt species, e.g. HE being 0.82 in E. globulus Labill.50, 0.75 in E. gomphocephala24 and 0.739 in E. urophylla S.T. Blake51. Northern populations tend to be significantly less diverse than the southern populations, similar to the observations based on chloroplast DNA sequences52. It is possible that the relatively isolated northern populations have been colonized from the south52 as genetic diversity is generally higher in source populations than in younger populations derived from them51. Moreover, as E. grandis may cross with co-occurring close relatives such as E. saligna Smith, E. robusta Smith and E. resinifera Smith21, gene flow from the related species may have contributed to the high genetic diversity, in the southern populations in particular52. Additionally, all populations analysed here were sampled from a field trial, in which selection might have taken place during earlier seed germination, seedling establishment and juvenile survival stages, and the neutral genetic diversity could be thus more or less affected. Nevertheless, the overall high levels of genetic diversity may reflect the maintenance of large population size in E. grandis. The low levels of population differentiation (mean FST = 0.037) are less than the mean of widespread eucalypt species, e.g. mean FST of 0.062 and 0.055 at restriction fragment length polymorphism (RFLP) and SSR loci, respectively53. These results confirm also earlier findings of limited population structure in E. grandis based on isozyme markers23. Generally speaking, forest tree species including eucalypts have weak population structure54. Such weak population structure, in light of the large geographical disjunctions in the natural range of E. grandis (e.g. populations MS vs. Fin and Cre vs. Kin) that could contribute as a barrier to gene flow and induce genetic differentiation, may reflect recent emergence of the disjunctions from a more continuous distribution due to climate change52. In this study, besides the regions within E. grandis distribution range sampled earlier by Jones et al.52, an additional isolated region (populations Fin and Cre) in central coastal Queensland was included, providing a more comprehensive picture of the population diversity and structure. These results can have important implications for conservation and utilization of the species. The high levels of population diversity and certain genetic differentiation among populations detected herein could imply the great potential of further genetic resource exploitation. The populations featured with high diversity and/or rich private alleles, particularly those under potential threat by predicted climate change, should be considered in future conservation programs. For the purpose of association mapping studies, the population subdivision information will be of practical importance. Similar to the k-means partition of populations according to climatic variables implemented in genome scans with black spruce43, the 16 populations of E. grandis analysed were partitioned into homogeneous groups to identify FST outliers related with genetic divergence under specific environmental factors. Such climatic partitions may offer better odds to identify genes potentially involved in adaptation to a specific climate factor43, and those outliers specific only to a climatic factor could provide evident support in this respect. Based on the climatic partition of population groups, 35, 22 and 20 outlier loci (Table 3) were identified collectively for mean annual temperature, isothemality and annual precipitation, respectively. Of these, eight, eight and five loci presented higher genetic differentiation (FST) values in at least two outlier detection methods than expected in the respective climatic variables, suggesting that they could have been involved in divergent selection among homogeneous population groups. In contrast, 11, 12 and seven loci presented lower FST values than expected in the respective variables, signaling balancing selection within population groups and/or homogeneous selection among groups. In addition, quantitative trait differentiation coefficients (QST) are 0.33 and 0.28 in 7-year-old tree height and diameter at breast height, respectively, for a range-wide provenance/family trial of E. grandis (Trial E110-155; J Luo, China Eucalypt Research Centre, China, pers. comm., assuming an overall coefficient of relatedness of 0.4 amongst open-pollinated siblings), being much larger than the mean FST (0.037; Supplementary Table S3) and thus indicating the role of divergent selection on quantitative traits56 (our field trial was not used for estimating QST due to abnormal growth caused by infection of gall wasp, Leptocybe invasa Fisher and La Salle). The rates of FST outlier loci detected totally with all the three methods (18.2–31.8% depending on climatic partition; Table 3) are parallel to those reported in several forest trees, e.g. 16.7% (3/18) at genomic SSR and EST-SSR markers in E. gomphocephala24, 22–53% at gene-linked loci in Quercus robur L. and Q. petraea (Matt.) Liebl.57 and 31.3% (10/32) at EST-SSRs in Castanopsis fargesii Franchet58. However, such rates are generally greater than those reported by other studies in plants and animals, e.g. 12.0% (3/25) at SSRs in eelgrass (Zosteria marina L.)59, 11.4 % (5/44) at genomic SSRs and EST-SSRs in Q. robur and Q. ellipsoidalis E.J. Hill60 and 0.3% (18/6145) at restriction site associated DNAs (RADs) in kokanee salmon (Oncorhynchus nerka Walbaum)61. Given that only one FST outlier (0.9%; Table 3) was detected with the more conservative Bayesian method of BAYESCAN, the high proportion of outliers reported here is likely attributable to the utilization of relaxed summary-statistics approaches conducted in LOSITAN and ARLEQUIN. As a consequence, false positive outliers may exist, which remain still a challenge to be separated from real outliers59. Moreover, besides some common outliers (5–11 depending on climatic partition; Table 3) between LOSITAN and ARLEQUIN analyses, higher proportions of outliers were consistently detected by LOSITAN for the three climatic partitions as compared with those of ARLEQUIN, but it is impossible to figure out which detection method is more powerful. Furthermore, given the multiplicity of natural habitats of E. grandis and other eucalypt species, the FST outliers identified herein can be used as candidates for further genome scans against extended population samples.

Significant loci associated with climatic variables

Five (8.6%) of the FST outlier loci showed significant associations between allelic frequencies and climatic variations (Table 4). However, the remaining FST outliers should not be considered as false positives as they may be involved in adaptation on only a fraction of the climatic gradient or in part of adaptation complexes resulting from epistatic interactions43. Of the five significant loci, two (EUCeSSR1044 and Embra394) were homologous to known gene or predicted protein, but three (Embra180, EUCeSSR0755 and EUCeSSR0849) did not have significant matches or known functions when they were BlastX searched against the NCBI non-redundant protein database (http://blast.ncbi.nlm.nih.gov/Blast.cgi). The locus EUCeSSR1044, associated significantly with annual precipitation, was functionally annotated as C3HC4 type RING finger protein. Similarly, two C3HC4 RING finger genes were identified to be associated with local precipitation in black spruce43. In eukaryotes, C3HC4 type RING finger proteins act as E3 ubiquitin ligases, targeting numerous intracellular regulators in a wide range of physiological processes6263, including abscisic acid (ABA) mediated drought stress responses in plants6465. The plant hormone ABA can induce stomatal closure to mitigate transpirational water loss and is the master modulator of drought responses64. Changes in C3HC4 RING finger gene expression have been reported in response to drought stress in many plants, such as Arabidopsis thaliana (L.) Heynh.6465 and Populus species66−68. Thus, it is not surprising to find significant associations between C3HC4 type RING finger protein and annual precipitation as critically low precipitations can cause drought stress. The marker Embra394, significantly associated with mean annual temperature, was functionally annotated as thionin-like protein 2. The thionin-like proteins have been well documented in plants to be involved in protection against pathogens, including bacteria and fungi69. Furthermore, in wheat (Triticum aestivum L.), thionin-like genes can express differently in response to heat stress70 or be involved in pathogen resistance induced by low temperatures71. As similarities in adaptation exist among phylogenetically remote plants43, thionin-like proteins may take part in temperature responses in Eucalyptus. The remaining three loci were of unknown function, including EUCeSSR0755 which was associated with all three climatic variables and EUCeSSR0849 which was related to both mean annual temperature and isothermality (Table 4). These EST-derived loci may represent candidate genes for physiological response to environmental variation. Alternatively, they may be artefacts of genetic hitchhiking11, in which neutral markers are linked to a single gene under positive selection.

Implications for adaptation genomics in perennial trees

The relatively large number of FST outlier loci (58 in total, 52.7%; Table 3) may indicate that adaptive genetic variation is a genome-wide phenomenon72, in which multiple loci of small adaptive effect are spread across the genome. Similarly, relatively large numbers of FST outliers were revealed by genome scan in some other broadleaved trees (e.g. Po. alba L.73) and conifers (e.g. black spruce43). Divergent natural selection can change allele frequencies and thereby increase the adaptive trait value of a population and the number of individuals with the fitness traits, leading to local adaptation67. Here, seven alleles were revealed to be significantly associated with the three climatic variables in E. grandis, indicating the importance of these climate factors as selective agents. Most of the significant alleles were associated with mean annual temperature followed by isothermality and annual precipitation, suggesting the critical role of mean annual temperature in determining adaptive responses. Many studies have demonstrated the influence of these climatic agents on selection and adaptation in forest trees, such as black spruce43, E. gomphocephala24, C. fargesii58 and Pi. albies (L.) Karst.74. With the occurrence of globally rapid climatic change, in temperature in particular, these results could inform the adaptive responses of perennial trees to the environment. The existence of divergent natural selection, in addition to high genetic diversity, implies the great potential for evolution of tree populations in adaptive traits in the face of climate change6. The populations that harbour the highest frequency of favourable alleles underlying the adaptive traits will be the most adaptive, and vice versa, under a climatic scenario of pronounced warming temperature and declining rainfall19. Only one or two alleles of each significantly associated locus showed population variation with climatic variables, which may indicate that the significant alleles alter protein function in a way that the remaining alleles do not24. Given that none of the five significant loci have been directly investigated for their functional roles in perennial trees, these loci and their variation patterns described here will provide a foundation on which further functional characterisation work can rely. In conclusion, the widespread woody species E. grandis provides an unusual opportunity to study local population adaptation in perennial trees. Although a relatively low density of multi-allelic loci were used for genome scans (only 110 SSRs spanning the E. grandis genome), we found evident footprints of divergent selection at a suite of loci. Two loci that showed significant associations with climatic variables represent alleles of putative genes with known functional importance for response to climatic factors. In addition, high genetic diversity levels and weak population structure were detected in E. grandis natural populations. These results have implications for understanding the genomic basis of adaptation to climates in perennial woody trees as well as for conservation and utilization of the important hardwood tree E. grandis. Further genetic association and expression studies would be required to confirm the functional role of putative genes. Also, next-generation sequencing technologies, including those for genotyping of multi-allelic markers (e.g. short tandem repeats75), will help to investigate fine-scale genome-wide patterns of natural selection and local adaptation.

Additional Information

How to cite this article: Song, Z. et al. Genome scans for divergent selection in natural populations of the widespread hardwood species Eucalyptus grandis (Myrtaceae) using microsatellites. Sci. Rep. 6, 34941; doi: 10.1038/srep34941 (2016).
  50 in total

Review 1.  Range shifts and adaptive responses to Quaternary climate change.

Authors:  M B Davis; R G Shaw
Journal:  Science       Date:  2001-04-27       Impact factor: 47.728

2.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies.

Authors:  Daniel Falush; Matthew Stephens; Jonathan K Pritchard
Journal:  Genetics       Date:  2003-08       Impact factor: 4.562

3.  Identifying adaptive genetic divergence among populations from genome scans.

Authors:  Mark A Beaumont; David J Balding
Journal:  Mol Ecol       Date:  2004-04       Impact factor: 6.185

Review 4.  Combining population genomics and quantitative genetics: finding the genes underlying ecologically important traits.

Authors:  J R Stinchcombe; H E Hoekstra
Journal:  Heredity (Edinb)       Date:  2007-02-21       Impact factor: 3.821

Review 5.  Ecological genomics of local adaptation.

Authors:  Outi Savolainen; Martin Lascoux; Juha Merilä
Journal:  Nat Rev Genet       Date:  2013-11       Impact factor: 53.242

Review 6.  Forest tree genomics: growing resources and applications.

Authors:  David B Neale; Antoine Kremer
Journal:  Nat Rev Genet       Date:  2011-02       Impact factor: 53.242

7.  Genetic evidence for ecological divergence in kokanee salmon.

Authors:  Matthew A Lemay; Michael A Russello
Journal:  Mol Ecol       Date:  2015-02-03       Impact factor: 6.185

8.  The effects of artificial selection on the maize genome.

Authors:  Stephen I Wright; Irie Vroh Bi; Steve G Schroeder; Masanori Yamasaki; John F Doebley; Michael D McMullen; Brandon S Gaut
Journal:  Science       Date:  2005-05-27       Impact factor: 47.728

9.  Heat stress-responsive transcriptome analysis in heat susceptible and tolerant wheat (Triticum aestivum L.) by using Wheat Genome Array.

Authors:  Dandan Qin; Haiyan Wu; Huiru Peng; Yingyin Yao; Zhongfu Ni; Zhenxing Li; Chunlei Zhou; Qixin Sun
Journal:  BMC Genomics       Date:  2008-09-22       Impact factor: 3.969

10.  Comparative transcriptomics of drought responses in Populus: a meta-analysis of genome-wide expression profiling in mature leaves and root apices across two genotypes.

Authors:  David Cohen; Marie-Béatrice Bogeat-Triboulot; Emilie Tisserant; Sandrine Balzergue; Marie-Laure Martin-Magniette; Gaëlle Lelandais; Nathalie Ningre; Jean-Pierre Renou; Jean-Philippe Tamby; Didier Le Thiec; Irène Hummel
Journal:  BMC Genomics       Date:  2010-11-12       Impact factor: 3.969

View more
  6 in total

1.  Testing for fitness epistasis in a transplant experiment identifies a candidate adaptive locus in Timema stick insects.

Authors:  Romain Villoutreix; Clarissa F de Carvalho; Zachariah Gompert; Thomas L Parchman; Jeffrey L Feder; Patrik Nosil
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2022-05-30       Impact factor: 6.671

2.  Seascape Genomics of the Sugar Kelp Saccharina latissima along the North Eastern Atlantic Latitudinal Gradient.

Authors:  Jaromir Guzinski; Paolo Ruggeri; Marion Ballenghien; Stephane Mauger; Bertrand Jacquemin; Chloe Jollivet; Jerome Coudret; Lucie Jaugeon; Christophe Destombe; Myriam Valero
Journal:  Genes (Basel)       Date:  2020-12-13       Impact factor: 4.096

3.  EST-SSR-based landscape genetics of Pseudotaxus chienii, a tertiary relict conifer endemic to China.

Authors:  Shufeng Li; Zhen Wang; Yingjuan Su; Ting Wang
Journal:  Ecol Evol       Date:  2021-06-15       Impact factor: 2.912

4.  The first identification of genomic loci in plants associated with resistance to galling insects: a case study in Eucalyptus L'Hér. (Myrtaceae).

Authors:  Miaomiao Zhang; Changpin Zhou; Zhijiao Song; Qijie Weng; Mei Li; Hongxia Ji; Xiaoyong Mo; Huanhua Huang; Wanhong Lu; Jianzhong Luo; Fagen Li; Siming Gan
Journal:  Sci Rep       Date:  2018-02-02       Impact factor: 4.379

5.  Thirteen years under arid conditions: exploring marker-trait associations in Eucalyptus cladocalyx for complex traits related to flowering, stem form and growth.

Authors:  Osvin Arriagada; Antonio Teixeira do Amaral Junior; Freddy Mora
Journal:  Breed Sci       Date:  2018-06-29       Impact factor: 2.086

6.  Genotyping-by-Sequencing of Gossypium hirsutum Races and Cultivars Uncovers Novel Patterns of Genetic Relationships and Domestication Footprints.

Authors:  Shulin Zhang; Yaling Cai; Jinggong Guo; Kun Li; Renhai Peng; Fang Liu; Jeremy A Roberts; Yuchen Miao; Xuebin Zhang
Journal:  Evol Bioinform Online       Date:  2019-11-20       Impact factor: 1.625

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.