Literature DB >> 29977903

The Genetic Diversity and Geographic Differentiation of the Wild Soybean in Northeast China Based on Nuclear Microsatellite Variation.

Hongkun Zhao1,2, Yumin Wang3, Fu Xing1, Xiaodong Liu3, Cuiping Yuan3, Guangxun Qi2, Jixun Guo1, Yingshan Dong2,3.   

Abstract

In this study, the genetic diversity and population structure of 205 wild soybean core collections in Northeast China from nine latitude populations and nine longitude populations were evaluated using SSR markers. A total of 973 alleles were detected by 43 SSR loci, and the average number of alleles per locus was 22.628. The mean Shannon information index (I) and the mean expected heterozygosity were 2.528 and 0.879, respectively. At the population level, the regions of 42°N and 124°E had the highest genetic diversity among all latitudes and longitudes. The greater the difference in latitude was, the greater the genetic distance was, whereas a similar trend was not found in longitude populations. Three main clusters (1N, <41°N-42°N; 2N, 43°N-44°N; and 3N, 45°N->49°N) were assigned to populations. AMOVA analysis showed that the genetic differentiation among latitude and longitude populations was 0.088 and 0.058, respectively, and the majority of genetic variation occurred within populations. The Mantel test revealed that genetic distance was significantly correlated with geographical distance (r = 0.207, p < 0.05). Furthermore, spatial autocorrelation analysis showed that there was a spatial structure (ω = 119.58, p < 0.01) and the correlation coefficient (r) decreased as distance increased within a radius of 250 km.

Entities:  

Year:  2018        PMID: 29977903      PMCID: PMC6011050          DOI: 10.1155/2018/8561458

Source DB:  PubMed          Journal:  Int J Genomics        ISSN: 2314-436X            Impact factor:   2.326


1. Introduction

The annual wild soybean (Glycine soja Sieb. and Zucc.), the direct progenitor of the cultivated soybean (Glycine max (Linn.) Merr.), is a predominantly self-pollinated annual plant species [1, 2]. It is widely distributed across most provinces of China, with the exceptions of Qinghai, Xinjiang, and Hainan [3]. In Northeast China, the wild soybean is well known for its abundant populations, high population density, and rich phenotypic types [4]. A total of 48 wild soybean in situ reserves have been established in China, 14 of which are located in Northeast China. A total of 8518 wild soybean accessions have been ex situ conserved in the National Gene Bank, and nearly half were collected from Northeast China [5]. Some reports have suggested that Northeast China was probably a very important diversity center for wild soybeans [6-8]. Genetic diversity is essential to population stability and is the basis of the evolution of species [9]. The genetic diversity and population structure of wild soybeans have been described in many reports [10-13]. The wild soybean in Northeast China, which is a very important ecotype, has been used in many previous studies [14-18]. The genetic diversity of wild soybeans in Northeast China was often compared with that in other areas using morphological traits and various molecular markers. Alternatively, wild soybeans from some specific areas have been analyzed for their distribution patterns, origin, evolution, classification, and so on. Accordingly, results have not been consistent because different accessions (or populations), numbers of samples, and analysis methods have been used. SSR markers have proven to be a reliable tool for determining the diversity of the wild soybean [19]. However, regarding wild soybeans in Northeast China as a single population, their genetic diversity and population structure have not been fully understood using molecular markers; in particular, the genetic differentiation of geographical populations of this region has not been well investigated. The objectives of this work were to determine the extent of genetic variation in the wild soybean in Northeast China, to elucidate the geographical structure and genetic differentiation within and between latitude or longitude populations, and, ultimately, to provide valuable information for the scientific protection and efficient utilization of wild soybean resources in Northeast China.

2. Materials and Methods

2.1. Plant Material

All the wild soybean accessions in this study were preserved in the Gene Bank of Jilin Academy of Agricultural Sciences. A total of 205 wild soybean accessions from Northeast China were selected from the core collection established by Zhao et al. [20]. The collection sites of these samples were located within the region of 39–52°N and 119–133°E, covering 84 counties or districts. Nine latitude populations and nine longitude populations were divided based on a degree interval when the sample size was higher than 10; otherwise, the samples were combined into adjacent latitude or longitude population (see Figure 1 and Supplementary Table 1).
Figure 1

Geographic distribution map of wild soybean accessions in Northeast China. ■, ●, and ▲ represent the wild soybean core collections from Heilongjiang (HLJ), Jilin (JL), and Liaoning (LN), respectively.

2.2. SSR Analysis

One single fresh leaf was used to extract genomic DNA for each accession using a modified CTAB method [21]; 43 SSR markers (see Supplementary Table 2), developed from 60 core loci [22] on 20 genetic linkage groups, were used to detect nuclear DNA variation. The primer sequences, with their linkage group locations, are available at https://www.soybase.org/dlpages/#soybasedata. The 25 μL PCR reaction buffer consisted of 2.5 μL 10x PCR buffer (100 mmol/L), Tris-HCl (pH 8.3), 500 mmol/L KCl, 200 mmol/L MgCl2 (0.001% gelatin, 0.1% Np-40), 0.4 μL dNTP (2.5 mmol/L), 2.0 μL each of forward and reverse primers (10 pmol/L), 2.0 μL total DNA (20 ng/μL), 1.0 μL Taq DNA polymerase (2 U/μL), and 15.1 μL ddH2O. PCR was performed using the T100™ thermal cycler (Bio-Rad, USA) with the following cycle conditions: an initial denaturing at 95°C for 5 min, followed by 35 cycles of 95°C denaturing for 45 s, 52–57°C annealing for 45 s, 68°C extension for 45 s, and a final extension at 72°C for 10 min. Amplified products were fractionated by electrophoresis through 6% denaturing polyacrylamide gels and stained with silver staining, which could detect 60 bp to 500 bp fragments and had a high resolution of 2 bp. The size of the stained band was analyzed based on its migration distance relative to the 100 bp DNA ladder (MBI Fermentas) using AlphaView software (version 1.3.0.7).

2.3. Data Analysis

The amplification fragments of genomic DNA by each SSR marker were scored based on the migration difference. The data format was converted accordingly in Microsoft Excel. The number of alleles (N a), Shannon-Weaver index (I), expected heterozygosity (H e), observed heterozygosity (H 0), fixation index (F is), genetic differentiation coefficient (F st), genetic identity and genetic distance, molecular variation analysis of variance (AMOVA), Mantel tests, and spatial autocorrelation coefficients were computed by GenAIEx v6.5 [23]. The Shannon-Weaver index (I) and expected heterozygosity (H e) for evaluating the diversity were measured according to the formulas I = −1 × sum (p × Ln (p )) and H e = 1–sum p 2, where p is the frequency of the ith allele. H 0 is generally lower than H e due to inbreeding, and F is = (1 − H 0)/H e. Outcrossing rate (t) was calculated using the equation t = (1 − F is)/(1 + F is) [24]. Based on the matrix of Nei's genetic distance [25], the dendrogram was constructed using NTSYSpc21 [26]. Spatial autocorrelation analysis was performed using “spatial-single pop”; spatial distance was set as 100 km, and the number of permutations and the bootstraps of the selection mode were set as 999 times. The population genetic structure was predicted by STRUCTURE 2.3.4 [27]. The default k value was set to 1 to 12, and ten runs were performed for each value of k to test stability of the results. The MCMC (Markov chain Monte Carlo) value was set as 100,000 burn-in with 200,000 iterations. The correct number of genetic clusters was inferred according to a value of Δk (Δk = mean|lnP(k + 1) − 2lnP(k) + lnP(k − 1)|/Sd|lnP(k)|) [28].

3. Results

3.1. SSR Polymorphism and Geographic Variation

Two hundred five representative wild soybean accessions across Northeast China were used in this study. A total of 973 alleles were detected from 43 SSR loci, and the percentage of polymorphic loci was 100%. The number of alleles per SSR marker varied from 13 (Satt309) to 36 (Satt286), with an average of 22.628. The mean Shannon information index (I) and expected heterozygosity (H e) were 2.528 and 0.879, respectively (see Table 1).
Table 1

Genetic diversity of 205 wild soybean accessions by 43 nSSRs.

NumberPrimerLG N a I H 0 H e F is t
1satt005Dlb + W282.8620.0250.9240.9730.014
2satt022N232.5930.0100.8900.9890.006
3satt099L172.2570.0200.8620.9760.012
4satt112E192.5710.0100.9090.9890.006
5satt146F212.5820.0340.9020.9620.019
6satt168B2212.7430.0000.9151.0000.000
7satt180C1182.1450.0050.8130.9940.003
8satt184Dla + Q242.6580.0050.9000.9950.003
9satt197B1312.6330.0050.8760.9940.003
10satt216Dlb + W262.2950.0050.8350.9940.003
11satt226D2222.4770.0050.8880.9940.003
12satt236A1172.4950.0150.9030.9840.008
13satt239I222.2950.0050.8220.9940.003
14satt242K222.5800.0050.8820.9940.003
15satt243O262.9510.0150.9370.9840.008
16satt267Dla + Q182.4400.0590.8900.9340.034
17satt268E182.2840.0050.8590.9940.003
18satt279H282.8150.0050.9160.9950.003
19satt281C2292.8690.0050.9200.9940.003
20satt286C2363.1240.0200.9410.9790.011
21satt300A1202.5400.0050.9030.9950.003
22satt307C2262.8770.0610.9230.9340.034
23satt308M272.7180.0150.8930.9830.009
24satt309G131.4240.0050.5860.9920.004
25satt334F202.3840.0100.8620.9880.006
26satt345O262.8270.0050.9210.9950.003
27satt346M162.0750.0050.7870.9940.003
28satt352G242.6750.0100.9020.9890.006
29satt373L222.3110.0050.8470.9940.003
30satt386D2212.4890.0050.8910.9940.003
31satt390A2252.5930.0000.8941.0000.000
32satt429A2262.7860.0050.9160.9950.003
33satt431J232.6280.0050.8930.9950.003
34satt434H192.3940.0000.8781.0000.000
35satt453B1202.3010.0050.8470.9940.003
36satt462L232.6180.0100.8950.9890.006
37satt487O192.2790.0050.8660.9940.003
38satt530N192.3300.0050.8680.9940.003
39satt571B2212.5200.0260.8970.9710.015
40satt586F332.9940.0050.9320.9950.003
41satt588K262.6190.0000.8961.0000.000
42satt590M222.5700.0290.8990.9670.017
43satt596J162.0710.0050.8350.9940.003
Mean22.6282.5280.0110.8790.9870.007

LG = linkage group.

At the population level (see Table 2), the Shannon information index (I) of nine latitude populations ranged from 1.300 to 2.419, with an average of 1.716; the expected heterozygosity (H e) ranged from 0.661 to 0.884, with an average of 0.756. The region of 42°N had the highest genetic diversity among all latitudes. The Shannon information index (I) of nine longitudes varied from 1.659 to 2.368, averaging 1.919; the expected heterozygosity (H e) varied from 0.751 to 0.880, averaging 0.805. The region of 124°E had the highest genetic diversity among all longitudes. As shown in Tables 3 and 4, the smaller the latitude difference was, the higher the genetic identity was; however, similar trends were not found in longitude populations. Two major geographical clustering groups (N and S) can be seen in the UPGMA dendrogram. Group N consists of four northern latitude groups (46°N–>49°N), and group S consists of five southern latitude groups (<41°N–45°N) (see Figure 2). The results indicate that the genetic diversity of wild soybean accessions in Northeast China is related to their latitudinal origin.
Table 2

Genetic diversity of different geographical populations in Northeast China.

Longitude Pop. I H e Longitude Pop. I H e
<41°N1.9150.808<122°E1.9590.820
42°N2.4190.884123°E1.8080.789
43°N2.0630.816124°E2.3680.880
44°N1.6210.751125°E2.0670.833
45°N1.8820.790126°E2.0940.827
46°N1.3020.661127°E1.7710.780
47°N1.5210.733128°E1.6610.769
48°N1.3000.662129°E1.8800.797
>49°N1.4190.699>130°E1.6590.751
Total1.7160.756Total1.9190.805

Pop. = population.

Table 3

Nei's genetic distance and genetic differentiation among latitude populations.

Group<41°N42°N43°N44°N45°N46°N47°N48°N>49°N
<41°N0.0420.0790.1010.1110.1830.1500.1780.162
42°N0.4470.0400.0610.0620.1200.0910.1190.105
43°N0.6740.3460.0490.0660.1390.1110.1400.119
44°N0.7900.4890.3340.0640.1560.1200.1420.126
45°N0.9410.4990.4350.4020.0960.0800.1180.112
46°N1.4280.8010.8180.8250.4460.0930.1550.163
47°N1.4410.7880.8000.7840.4860.4170.0860.098
48°N1.4320.8580.8860.7620.6150.6400.4140.074
>49°N1.4910.8960.8230.770.6740.7800.5500.349

Pop. = population; genetic differentiation coefficient (F st) (above diagonal); Nei's genetic identity (below diagonal).

Table 4

Nei's genetic distance and genetic differentiation among longitude populations.

Group<122°E123°E124°E125°E126°E127°E128°E129°E>130°E
<122°E0.1030.0490.0820.0960.1240.1060.0950.132
123°E0.9740.0550.0520.0440.0660.0750.0650.083
124°E0.5350.5350.0360.0390.0600.0550.0490.069
125°E0.8180.4380.4040.0360.0520.0650.0420.076
126°E0.9290.3450.3740.3060.0350.0350.0370.034
127°E1.3040.5040.5630.4260.2860.0600.0610.072
128°E0.9800.5720.5200.5370.2940.4600.0500.056
129°E0.8970.5000.4790.3640.2980.4680.3970.068
>130°E1.1680.5450.5300.5330.2270.4630.3770.443

Pop. = population; genetic differentiation coefficient (F st) (above diagonal); Nei's genetic identity (below diagonal).

Figure 2

UPGMA dendrogram based on Nei's genetic identity among the latitudes.

3.2. Population Structure and Genetic Differentiation

The STRUCTURE procedure was run to predict genetic structure for each predefined latitude and longitude population. When k was 3, Δk was the highest, which indicated that 3 main clusters had been identified (see Table 5). For the latitude population, 83.8% of individuals from the <41°N region and 50.2% of individuals from the 42°N region were assigned to Cluster1N, most individuals from the 43°N to 44°N region to Cluster2N, and individuals from the 45°N to >49°N regions to Cluster3N. These results were roughly consistent with those of hierarchical cluster analysis (see Figure 2). Some admixtures were found among the 41°N–45°N regions, which probably were important transitional areas. For the longitude population, most of the individuals from the <122°E region were separated from all others and formed a cluster (Cluster1E); three regions of 123°E, 125°E, and 129°E were assigned to Cluster2E, and three regions of 127°E, 128°E, and 130°E to Cluster3L. Admixtures were found widely among 9 predefined longitude populations. The regions of 42°N, 124°E, and 126°E were special, as they were not dominated (<60%) by the three clusters.
Table 5

Inferred population structure based on latitude populations and longitude populations.

Pop.Inferred clustersPop.Inferred clusters
Cluster1NCluster2NCluster3NCluster1ECluster2ECluster3E
<41°N0.1580.0040.838<122°E0.8910.1060.003
42°N0.4310.0660.502123°E0.0780.6900.232
43°N0.9160.0190.065124°E0.4620.3600.179
44°N0.9430.0540.002125°E0.1220.7140.164
45°N0.2460.7410.013126°E0.0290.4330.539
46°N0.0340.9620.003127°E0.0020.3030.695
47°N0.0210.9770.003128°E0.0530.3050.642
48°N0.0520.9460.002129°E0.1100.6140.276
>49°N0.1510.8470.002>130°E0.0030.2030.795

Pop. = population.

The F st value was used to evaluate the genetic differentiation of wild soybean populations at the scales of latitude and longitude in Northeast China (see Tables 3 and 4). Pairwise F st values for latitude populations ranged from 0.040 to 0.183, and pairwise F st values for longitude populations ranged from 0.034 to 0.132. In general, moderate differentiation (0.05 < F st < 0.15) [29] was observed between most latitude and longitude populations, and the genetic differentiation among adjacent groups was relatively low. These results were confirmed by AMOVA; most of the genetic variations were found within latitude and longitude populations (see Table 6).
Table 6

AMOVA analysis of different geographical populations.

GroupSource of variationSSMSEst. Var.Percentage of variation F st p
LatitudeAmong pops723.15190.3941.6959%0.0880.001
Within pops7058.35817.60217.60291%
LongitudeAmong pops539.01267.3761.1116%0.0580.001
Within pops7242.49818.06118.06194%

Probability, p (rand ≥ data), for F st is based on standard permutation across the full data set. F st = Est. Var. among pops/(Est. Var. among pops + Est. Var. within pops); SS = the sums of squares; MS = the mean sums of squares; Est. Var. = the estimated variance.

The Mantel test indicated that there was a positive correlation between geographic and genetic distance (r = 0.207, p < 0.05), which suggests that geographic distance limits gene flow among populations and influences the genetic structure. For further analyses, spatial autocorrelation analysis was performed by distance classes of 100 km, and a general decline was found in the correlation coefficient (r) with distance. The correlation values were negative and significant up to 250 km. This revealed that there is a clinal spatial structure in wild soybeans in Northeast China (ω = 119.58, p < 0.01) (see Figure 3).
Figure 3

Results of spatial structure analysis. r: solid lines represent spatial autocorrelation coefficients; U and L: dashed lines represent 95% confidence interval.

4. Discussion

The wild soybean in Northeast China is an important ecotype, and its genetic diversity has been widely studied by using phenotypic traits [7, 30, 31] and molecular markers [17]. The common view has been that the wild soybean in this region possesses a high genetic variation. In this study, the average allele number, Shannon index (I), and expected heterozygosity (H e) were 22.628, 2.528, and 0.879, respectively, which were significantly higher than previously reported results [32-34]. The rich genetic variation could be attributed to the fact that the samples in this study were selected from the core collection, which has been defined as a subset of a crop species preserved with the most abundant repetitiveness [35, 36]. On the other hand, the large sample size and various geographical origins might also result in high diversity; 205 samples used in this study, accounting for 85% of core collections in Northeast China, were selected from 242 core collections developed by Zhao et al. [20]. These accessions may have a continuous distribution in Northeast China (see Supplementary Table 1). Furthermore, 43 SSR primers covering 20 linkage groups might also be important causes for the detection of richer genetic variation [37]. The wild soybean is widely distributed in Northeast Asia. The higher its genetic diversity is, the greater its habitat-expansion capacity and environmental adaptation are [38]. Therefore, it also forms a specific natural distribution pattern [39]. In our study, the results indicate that the region of 42°N and 124°E has the highest level of genetic diversity (see Table 2), and the results roughly agree with those of previous studies based on morphological traits [7]. The results support the view that the genetic diversity of wild soybeans in Northeast China is related to latitude but not to longitude; three evolutionarily significant units were distinguished by latitude, corresponding to regions of <41°N-42°N, 43°N-44°N, and >45°N (see Tables 3 –5). Moderate differentiation among the latitude populations (the mean F st value was 0.088) and longitude populations (the mean F st value was 0.058) occurred (see Table 6). This implies that natural selection might be the main cause of genetic structure [8, 29]. Previous studies have revealed that wild soybean genotypes exhibit regional distributions at different geographical scales [15, 40, 41], which are especially associated with latitudinal origin. However, some studies have also reported that the genetic differentiations were associated with longitude origins; for example, Leamy et al.'s results showed that the four genetic groups (Central China, Northern China, Korea, and Japan) differed more in longitude than in latitude [13]. Possible explanations for those results may include small sample size, large geographic span, strait isolation, and diverse ecosystems. Genetic structure was mainly determined by the breeding system, gene flow, distance isolation, and so on [42]. Wild soybean is a strictly self-pollinating plant, with limited pollen flow. In general, for self-pollination-dominated plants, with an average G st = 0.51, the total genetic variation among the populations accounts for more than half of the genetic structure; for out-crossing-dominated species G st = 0.10, 90% of genetic variation occurs within populations [43]. In the present study, most of the genetic variation was found between individuals within populations, with less than 10% among populations (see Table 6). This suggests that the genetic differentiation among latitude or longitude populations in Northeast China is similar to that of out-crossing-dominated species [17, 40]. Zhao speculated that this phenomenon could be explained by out-crossing rates and long-distance gene flow [44]. Our results show that the out-crossing rate of Northeast China wild soybeans is only 0.7% (F is = 0.987) (see Table 1), which confirms its selfing mating reproductive system and plays an important role in keeping a strong genetic structure. The Mantel test revealed that there was a positive relationship between geographic and genetic distance (r = 0.207, p < 0.05), which indicates that geographical isolation has also been an important factor in forming the current genetic structure of wild soybeans in Northeast China. Spatial autocorrelation analysis revealed that the correlation between geographical distance and genetic distance is limited. The wild soybean in Northeast China has become one of the most severely endangered wild plant species due to the interference of human activities [45]. The genetic diversity of wild soybeans is high, indicating that the wild soybean in this region has great potential for evolution, and in situ conservation is preferable. Although wild soybean accessions ex situ conserved in the National Gene Bank are more substantial than those from other areas, the collection in this area is still very limited, so further investigation and collection works in this region are necessary. According to distribution patterns of the genetic diversity of wild soybeans in Northeast China, the conservation strategy should emphasize individual protection, and protection in areas with high genetic diversity should be prioritized.

5. Conclusions

In summary, the genetic diversity and geographic population structure of the wild soybean in Northeast China were fully investigated as a single population, or at different latitude or longitude populations for the first time. The distribution pattern of genetic variation is related to latitude, and the highest level of genetic diversity was found at 42°N, and protection in areas with higher genetic diversity should be prioritized. This study disclosed that natural selection to adapt temperature and photoperiod, selfing mating reproductive system, and distance isolation resulted in the current population structure of wild soybean in Northeast China.
  13 in total

1.  Inference of population structure using multilocus genotype data.

Authors:  J K Pritchard; M Stephens; P Donnelly
Journal:  Genetics       Date:  2000-06       Impact factor: 4.562

2.  Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study.

Authors:  G Evanno; S Regnaut; J Goudet
Journal:  Mol Ecol       Date:  2005-07       Impact factor: 6.185

3.  Extraction of DNA from milligram amounts of fresh, herbarium and mummified plant tissues.

Authors:  S O Rogers; A J Bendich
Journal:  Plant Mol Biol       Date:  1985-03       Impact factor: 4.076

4.  Comparative phylogeography and postglacial colonization routes in Europe.

Authors:  P Taberlet; L Fumagalli; A G Wust-Saucy; J F Cosson
Journal:  Mol Ecol       Date:  1998-04       Impact factor: 6.185

5.  Genetic diversity and peculiarity of annual wild soybean (G. soja Sieb. et Zucc.) from various eco-regions in China.

Authors:  Zixiang Wen; Yanlai Ding; Tuanjie Zhao; Junyi Gai
Journal:  Theor Appl Genet       Date:  2009-05-18       Impact factor: 5.699

6.  Population structure of the wild soybean (Glycine soja) in China: implications from microsatellite analyses.

Authors:  Juan Guo; Yifei Liu; Yunsheng Wang; Jianjun Chen; Yinghui Li; Hongwen Huang; Lijuan Qiu; Ying Wang
Journal:  Ann Bot       Date:  2012-07-11       Impact factor: 4.357

7.  GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research--an update.

Authors:  Rod Peakall; Peter E Smouse
Journal:  Bioinformatics       Date:  2012-07-20       Impact factor: 6.937

8.  Environmental versus geographical effects on genomic variation in wild soybean (Glycine soja) across its native range in northeast Asia.

Authors:  Larry J Leamy; Cheng-Ruei Lee; Qijian Song; Ibro Mujacic; Yan Luo; Charles Y Chen; Changbao Li; Susanne Kjemtrup; Bao-Hua Song
Journal:  Ecol Evol       Date:  2016-08-14       Impact factor: 2.912

9.  Genetic diversity and population structure: implications for conservation of wild soybean (Glycine soja Sieb. et Zucc) based on nuclear and chloroplast microsatellite variation.

Authors:  Shuilian He; Yunsheng Wang; Sergei Volis; Dezhu Li; Tingshuang Yi
Journal:  Int J Mol Sci       Date:  2012-10-03       Impact factor: 5.923

10.  Environmental and Historical Determinants of Patterns of Genetic Differentiation in Wild Soybean (Glycine soja Sieb. et Zucc).

Authors:  Shui-Lian He; Yun-Sheng Wang; De-Zhu Li; Ting-Shuang Yi
Journal:  Sci Rep       Date:  2016-03-08       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.