Literature DB >> 35554526

Population-specific, recent positive selection signatures in cultivated Cucumis sativus L. (cucumber).

Xinrui Lin¹, Ning Zhang¹, Hongtao Song¹, Kui Lin¹, Erli Pang¹.

Abstract

Population-specific, positive selection promotes the diversity of populations and drives local adaptations in the population. However, little is known about population-specific, recent positive selection in the populations of cultivated cucumber (Cucumis sativus L.). Based on a genomic variation map of individuals worldwide, we implemented a Fisher's combination method by combining 4 haplotype-based approaches: integrated haplotype score (iHS), number of segregating sites by length (nSL), cross-population extended haplotype homozygosity (XP-EHH), and Rsb. Overall, we detected 331, 2,147, and 3,772 population-specific, recent positive selective sites in the East Asian, Eurasian, and Xishuangbanna populations, respectively. Moreover, we found that these sites were related to processes for reproduction, response to abiotic and biotic stress, and regulation of developmental processes, indicating adaptations to their microenvironments. Meanwhile, the selective genes associated with traits of fruits were also observed, such as the gene related to the shorter fruit length in the Eurasian population and the gene controlling flesh thickness in the Xishuangbanna population. In addition, we noticed that soft sweeps were common in the East Asian and Xishuangbanna populations. Genes involved in hard or soft sweeps were related to developmental regulation and abiotic and biotic stress resistance. Our study offers a comprehensive candidate dataset of population-specific, selective signatures in cultivated cucumber populations. Our methods provide guidance for the analysis of population-specific, positive selection. These findings will help explore the biological mechanisms of adaptation and domestication of cucumber.

Entities: Chemical

Keywords: zzm321990 Cucumis sativus L; Fisher’s combination; cultivated cucumber; population-specific; recent positive selection

Mesh：

Year: 2022 PMID： 35554526 PMCID： PMC9258548 DOI： 10.1093/g3journal/jkac119

Source DB: PubMed Journal: G3 (Bethesda) ISSN： 2160-1836 Impact factor: 3.542

Introduction

Locating population-specific, positive selection from genetic variation is a vital and considerably challenging task in evolutionary biology. Positive selection drives local adaptation (Casillas and Barbadilla 2017) and may also help clarify the mechanisms of biological evolution. Positively selected genomic regions were related to domestication (Li ; Wang ), life habits (Fumagalli ), diseases (Wang ), and specialization (Acer ). Living in a new environment, changing environmental factors (Acer ) leads to a shift in the distribution of fitness effects for genomic variants (John and Seetharaman 2016). High-fitness selected variants spread rapidly and leave signatures in genomes. Research issues have utilized the signatures to detect positively selected genes that contribute to its adaption to the environments in Arabidopsis thaliana (Hancock ) and rice (Liu ). The signatures were also used to reveal domestication and breeding selection in crop (Wu ). Selection imprint signatures in genomes are used to detect targets that are selected, including a low genetic diversity, a shift in the site frequency spectrum of polymorphisms, and an excess of linkage disequilibrium (LD) (Stephan 2019). To capture the signatures, corresponding methods have been provided, including nucleotide diversity measures (Weir and Cockerham 1984; Chu and Wei 2020), site frequency spectrum-based approaches (Tajima 1989; Nielsen ; Hua ; Racimo 2016), and haplotype-based methods (Voight ; Sabeti ; Tang ; Anna ). Moreover, some of them are based on a single population, and others are based on the comparisons of multiple populations. However, research has reported that no single method is able to identify all of selective sweeps occurring at any given time (Hannah and Florian 2018); thus, combining several methods can greatly increase the power to pinpoint the selected region (Grossman ; Alexandra ). Haplotype-based methods, as powerful tools to detect recent positive selection, have developed many relevant metrics, such as integrated haplotype score (iHS) (Voight ), number of segregating sites by length (nSL) (Anna ), cross-population extended haplotype homozygosity (XP-EHH) (Sabeti ), and Rsb (Tang ). IHS and nSL based on a single population perform well in selecting early stages (Alexandra ), whereas XP-EHH and Rsb based on the comparison of 2 populations are powerful for beneficial alleles shortly before or at fixation (Sabeti ; Tang ; Alexandra ). These methods have been widely used to reveal adaption and domestication of crops. XP-EHH detected genes associated with plant defense against insects and herbivores in tea (Zhang ). In tomato, selective sweeps identified by iHS were related to fruit weight (Zhao ). Signatures of soft sweeps underlying determinate growth habit were found in soya bean by haplotype-based methods (Zhong ). Cucumber (Cucumis sativus L.) is an important vegetable crop mainly cultivated in temperate and tropical regions of the world (Che and Zhang 2019). It is indigenous to India and the history of its domestication dates back to more than 2,500 years ago (Qi ; Chomicki ). A previous study generated a genomic variation map of 115 cucumber lines sampled from 3,342 accessions worldwide and divided the 115 lines into 3 cultivated groups and 1 wild group (Qi ). In addition, common domestication sweeps were detected in the merged 3-in-1 cultivated group (Supplementary Fig. 1). Later, comparative analysis between wild and cultivated cucumbers was carried out using RNA-sequencing data (Abdel-Salam ; Yan ). Besides, 56 artificially selected cucumber inbred lines, where the genetic background was significantly different from the previous reported 115 lines, were assigned to group 1 and group 2, and selected regions were identified between group 1 and group 2 (Liu ). Although there have been some studies on selection in cultivated cucumbers, none of them use haplotype-based methods, which are more powerful to recent positive selection. Thus, our research connecting genotypic variants and local adaption remains meaningful. Here, we combined 4 haplotype-based approaches, which increased detection capabilities, and identified local adaptive signatures in each cultivated population. We attempted to address the specific selection in each cultivated population and furthermore investigated whether soft or hard sweeps are the common mode of adaptation in cultivated cucumber.

Materials and methods

Genomic variation

We downloaded the genomic variation from a previous study (Qi ), which had sampled a core collection of 115 cucumber lines that captured a large proportion of the genetic diversity. Overall, the whole-genome resequencing of all 115 lines was performed with average coverage of 95.2% and depth of 18.3×. All these 115 lines were divided into 4 geographic groups: Indian, East Asian, Eurasian, and Xishuangbanna groups. The Indian group containing 30 lines mainly from India was identified as the wild group. The other 3 groups only contained cultivated lines: (i) East Asian group: 37 lines mainly came from China, Korea, and Japan; (ii) Eurasian group: 29 lines came from central and western Asia, Europe, and United States; and (iii) Xishuangbanna group: 19 landraces cultivated in the Xishuangbanna region of tropical southwestern China. Population structure and phylogenetic reconstruction supported that the 3 cultivated groups are monophyletic (Qi ). The dataset encompassed 3,263,035 SNPs in chromosomes, and these SNPs were used for analysis. To ensure the data quality, a series of quality filters were applied. We excluded individuals with more than 15% of missing data. Moreover, we filtered out singletons and SNPs with a proportion of missing data greater than 10% (Zhao ). The process of quality control was applied using VCFtools v0.1.17 (Danecek ). The remaining dataset (Supplementary Table 1) was used for downstream analysis.

Identification of ancestral alleles

To calculate the values of iHS (Voight ) and nSL (Anna ), an ancestral state for each variant was necessary. We inferred the ancestral state for each variant utilizing principles of maximum parsimony (Langley ). First, we obtained 7-way, cucumber-based, pairwise alignments from the website http://cmb.bnu.edu.cn/cisRCNEs_cucurbitaceous/index.html, which contained 7 closely related species in Cucurbitaceae. Based on the phylogenetic tree of the 7 species from the website (Supplementary Fig. 2), the clade consisting of C.sativus, Cucumis melo, Citrullus lanatus, and Lagenaria siceraria was named the 4-way clade and the clade consisting of all 7 species was named the 7-way clade. Based on the phylogeny and whole genome alignments, we inferred the ancestral states. The ancestral allele state for each site in cucumber was determined if more than 2 species in the 4-way clade or 4 species in the 7-way clade were consistent with each other. In addition, the allele state in cucumber different from that in ancestors was set as “derived.” Sites with ancestral allele states that could not be determined by the above 2 clades were set as “missing.”

Inference of haplotypes

To detect selection signatures based on phase data, the inference of haplotypes was carried out in BEAGLE 5.0 (Sharon and Brian 2007). Considering that a long sliding window may cause a high phasing error rate, for each population, we set the distance corresponding to half of the maximum mean r2 value as 1 block according to our analysis of the LD decay using PopLDdecay v3.40 (Chi ).

Detection of selection signatures within populations

Signals of selection within populations were evaluated by 2 methods, iHS (Voight ) and nSL (Anna ). IHS is a statistic comparing the extended haplotype homozygosity (EHH) (Sabeti ) between derived and ancestral alleles within a population. NSL is a statistic measuring the length of a segment of homozygosity between derived and ancestral alleles in terms of the number of mutations in the remaining haplotypes in the dataset in the same region (Anna ). The values of iHS and nSL were calculated and standardized using the rehh package (v3.2.1) (Gautier ) and selscan v1.2.0a (Szpiech and Hernandez 2014) with default parameters, respectively. Phase data and ancestry status were required for both of the calculators, and the SNPs with “missing” ancestral allele states were ignored. Considering the frequency dependence of expected iHS and nSL values under neutrality (Tabangin ), standardization was performed for markers binned with respect to the derived allele frequency at the focal marker. Since the standardized values of iHS and nSL approximately follow a standard Gaussian distribution (Voight ; Anna ), for each method for each SNP, a P value relative to the null-hypothesis of neutral evolution was assigned.

Detection of selection signatures between populations

Signals of selection between the cultivated population and Indian population were evaluated using 2 methods, XP-EHH (Sabeti ) and Rsb (Tang ). XP-EHH compares the integrated EHH between populations. Rsb is a method that compares the lengths of haplotypes associated with the same allele between populations (Tang ). The XP-EHH and Rsb values were calculated and standardized using the rehh package (v3.2.1) (Gautier ). The formulas of XP-EHH and Rsb showed nondependence on ancestral status; therefore, for each cultivated-wild pair of populations, all polymorphic phased sites were considered. Since the standardized values of XP-EHH and Rsb follow an approximately standard Gaussian distribution as well (Sabeti ; Tang ), a similar P value was assigned for each SNP.

Combining selection signatures

We used a Fisher’s combination method (Fisher 1954), which had been applied in human (Arciero ), to combine the P values of the 4 methods used for detecting selection signatures. For each SNP, let P be the P value of the test statistic using method i, (i∈{1, …, 4}). The Fisher’s combined test statistic is equal to: which follows a chi-squared distribution with 2n degrees of freedom (Fisher 1954). We adjusted the P value for multiple tests by applying Benjamini–Hochberg correction (Benjamini and Hochberg 1995). Sites with a false discovery rate less than 0.01 were considered selective sites.

Functional annotation and enrichment analysis

We obtained genomic structural and functional annotation from CuGenDB (9930 v2) (Zheng ). Gene ontology (GO) enrichment and pathway enrichment analysis were performed using the web tools in CuGenDB. Pfam domains annotation of protein-coding genes were predicted by PfamScan (El-Gebali ) based on Pfam35.0.

Classification of selection sweeps

To classify the selective sweeps into soft or hard sweeps, we performed simulations for the hard sweep and soft sweep scenarios using discoal (Kern and Schrider 2016), a coalescent simulator able to generate population samples conditioning on the fixation of an allele due to hard or soft sweeps. For each cultivated population, a 200-kb region was simulated based on its demographic history inferred by (Qi ). A mutation rate of μ = 1 × 10−7 per generation (Qi ) was used. Sweep scenarios were simulated with a positive additive selection coefficient s ∼ U[0,1], and the time of fixation looking backward in time τ ∼ U[0,0.005]. Soft sweeps were simulated as a mutation that arose neutral and turned beneficial at a frequency e ∼ U[0,0.2]. Hard sweeps were simulated from a de novo mutation that was never neutral, that is, e = 0. The frequency at which the selection ended f ∼ U[e, 1]. We used 100,000 coalescent simulations for each scenario and a total of 600,000 simulations were performed. We calculated Bayes factors by taking the ratio of distribution density of observed H12 and H2/H1 values in 2 types of simulated data (Garud ). The python scripts for calculating H12 and H2/H1 values were provided on the website https://github.com/ngarud/SelectionHapStats (Garud ). Bayes factors for each observed H12 and H2/H1 took the ratio of the support of hard and soft sweep scenarios. The support value was the number of data set generating H12 and H2/H1 values with a Euclidean distance <0.1 from the observed H12 and H2/H1 values. If the Bayes factor >1, we classified the selective site as a hard sweep. In contrast, if the Bayes factor <1, we classified it as a soft sweep. If the Bayes factor was 1 or the support value was 0 for both hard and soft sweep, we considered the selective sweep as an “unknown.”

Results

Signatures of adaptation in cultivated populations

To identify genomic signatures of selective sweeps in cultivated populations of cucumber living worldwide, we used 4 haplotype-based approaches: iHS (Voight ), nSL (Anna ), XP-EHH (Sabeti ), and Rsb (Tang ). IHS and nSL compare the lengths of haplotypes between derived and ancestral alleles in a single population, while XP-EHH and Rsb compare the lengths of haplotypes between a cultivated population and the wild population. To infer haplotypes in each population, BEAGLE 5.0 (Sharon and Brian 2007) was applied. The slide window was determined by the r2 values used to measure LD decay. The r2 values were listed in Supplementary Table 2. The graph of LD decay in the 4 populations was shown in Supplementary Fig. 3. Finally, parameters in BEAGLE were obtained shown in Supplementary Table 3. Filtered by the quality of SNPs and by the data requirements for each method, the genomic SNPs used in each method differed (Supplementary Table 4). For the iHS and nSL methods, the ancestral state for each variant was required; some SNPs were failed to determine the ancestral state, therefore, there were fewer SNPs than those used by XP-EHH and Rsb. For each SNP, the P value was calculated from each method detecting significant departures from the null hypothesis of neutral evolution (Voight ; Sabeti ; Tang ). Using the above approaches, variants under selection with a P value threshold of 0.01 were detected in each cultivated population (Supplementary Table 4). More signatures of selection were identified by the methods based on 2 populations than by the methods based on a single population (Supplementary Figs. 4–6), which was mainly because of 2 causes: (i) more SNPs used by the methods comparing 2 populations and (ii) the significantly higher sensitivity of the methods based on 2 populations compared with the methods based on a single population (Supplementary Table 5). The overlaps of 4 approaches in 3 cultivated populations were similar (Supplementary Figs. 7–9). The intersection of XP-EHH and Rsb was the biggest, followed by the intersection of iHS and nSL. The overlapping sites of all 4 approaches were 38, 23, and 8 SNPs in East Asian, Eurasian, and Xishuangbanna population, respectively. Combining the P values from the 4 methods provides a concise way to integrate the above findings (Zaykin ). P values were merged by Fisher’s combination, and candidate SNPs under positive selection were detected with a Benjamini–Hochberg-adjusted P value less than 0.01. According to the genomic structural annotation including 22,976 protein-coding genes in chromosomes, the SNPs under positive selection were further divided into 2 categories: in gene regions and in intergenic regions. In total, we identified 360 variants distributed in 22 genes in the East Asian population, in which 13 SNPs located in 2 previously identified domestication sweeps (Qi ). Meanwhile, we found 2,248 variants located in 60 genes in the Eurasian population, in which 416 variants located in 9 selective regions detected by previous research (Qi ). In addition, we detected 3,867 variants located in 120 genes in the Xishuangbanna population, in which 91 sites found in 7 domesticated regions in a previous study (Qi ) (Table 1). In the 3 cultivated populations, most selected sites were located in the intergenic regions. The most selected sites were in the Xishuangbanna population. And the fewest selective sites were in the East Asian population (Table 1). This may be due to more genetic differentiation between Indian population and Xishuangbanna population than between Indian population and the others. The overlapping selective SNPs from a previous study were mainly located on chromosomes 1 and 5 in the 3 populations (Fig. 1).

Table 1.

Variants showing the strongest signatures of positive selection in the 3 cultivated populations.

Population	SNPs^a	Previously detected SNPs^b	Gene regions^c	Intergenic regions^d	Genes^e
East Asian	360	13	44	316	22
Eurasian	2,248	416	226	2,022	60
Xishuangbanna	3,867	91	812	3,055	120

Number of selective SNPs with Benjamini–Hochberg-adjusted P value < 0.01.

Number of selective SNPs detected by a previous study (Qi ).

Number of selective SNPs located in gene regions.

Number of selective SNPs located in intergenic regions.

Number of genes harboring selective SNPs under selection.

Fig. 1.

Manhattan plots of combined P values of local adaptation in the East Asian a), Eurasian b), and Xishuangbanna populations c). The blue horizontal line indicates the cutoff of the Benjamini–Hochberg-adjusted P value equal to 0.01, and the sites above are selective. The red dots represent selective variants located in domestication sweeps identified in a previous study (Qi ). Manhattan plots were generated using the qqman package (Turner 2018). Variants showing the strongest signatures of positive selection in the 3 cultivated populations. Number of selective SNPs with Benjamini–Hochberg-adjusted P value < 0.01. Number of selective SNPs detected by a previous study (Qi ). Number of selective SNPs located in gene regions. Number of selective SNPs located in intergenic regions. Number of genes harboring selective SNPs under selection. To investigate the specific sites under selection in each cultivated population, we obtained the uniquely selective sites in each cultivated population. Most selective sites were population specific, consistent with the hypothesis of independent origin of the 3 cultivated populations (Qi ; Chomicki ). There were 331 SNPs under specific selection in the East Asian population, 2,147 variants in the Eurasian population and 3,772 sites in the Xishuangbanna population (Fig. 2). In the East Asian population, we found 7 SNPs leading to the change of amino acids and 72 SNPs localized at potential promoters of genes (defined as the 2-kb region upstream of the transcription start site) (Schwope ). In Eurasian, 19 missense variants and 761 SNPs located in promoter regions were observed. In Xishuangbanna, 73 missense variants and 1,095 SNPs localized at gene promoters were observed. We also obtained 21 SNPs under selection shared by the 3 cultivated populations, which were all located in intergenic regions (8 of them localized at gene promoters).

Fig. 2.

Venn diagram for selective sites in cultivated populations. Variants with Benjamini–Hochberg-adjusted P values less than 0.01 were identified as selective sites. A Venn diagram was generated using the package VennDiagram v1.6.20 (Chen and Boutros 2011). We performed GO and pathway enrichment analysis to evaluate if there were any gene classes overrepresented. In the Xishuangbanna population, GO terms related to basic biological activities were observed (Table 2). In the East Asian population, a pathway related to tetrahydrofolate biosynthesis was enriched. In the Eurasian population, two pathway related to l-lysine biosynthesis was overrepresented. In the Xishuangbanna population, a pathway related to l-N- acetylornithine biosynthesis was enriched (Table 3).

Table 2.

GO terms obtained by enrichment analysis in the Xishuangbanna population.

GO term	Description	Ontology^a	Adjusted P value
GO: 1902903	Regulation of supramolecular fiber organization	BP	0.01147
GO: 0051493	Regulation of cytoskeleton organization	BP	0.02129
GO: 0003918	DNA topoisomerase type II (ATP-hydrolyzing) activity	MF	0.02981
GO: 0061505	DNA topoisomerase II activity	MF	0.02981
GO: 0016308	1-Phosphatidylinositol-4-phosphate 5-kinase activity	MF	0.04457
GO: 0019013	Viral nucleocapsid	CC	0.02420
GO: 0019028	Viral capsid	CC	0.02903
GO: 0019012	Virion	CC	0.03451
GO: 0044423	Virion part	CC	0.03451

Ontology covers 3 domains: BP means biological process, MF means molecular function, and CC means cell component.

Table 3.

Pathway enrichment analysis in the cultivated populations.

Population	Pathway ID	Pathway name	Adjusted P value
East Asian	PWY-6614	tetrahydrofolate biosynthesis	0.003
Eurasian	PWY-5097	l-Lysine biosynthesis VI	0.0285
Eurasian	PWY-724	Superpathway of l-lysine, l-threonine and l-methionine biosynthesis II	0.0330
Xishuangbanna	PWY-6922	l-N^δ-acetylornithine biosynthesis	0.0229

GO terms obtained by enrichment analysis in the Xishuangbanna population. Ontology covers 3 domains: BP means biological process, MF means molecular function, and CC means cell component. Pathway enrichment analysis in the cultivated populations.

Unique selective signatures in the East Asian population

In the East Asian population, we found 44 specific selective sites located in 22 genes and 287 sites distributed in intergenic regions (Supplementary Table 6). In the 287 intergenic sites, 70 sites located in potential promoter regions. These genes were related to reproduction, plant response and resistance, and biosynthesis (Supplementary Table 7). We found 2 genes related to reproduction, one of which is WRKY27 gene (Csa3G354510) with 3 selective nonsynonymous variants, influencing pollen viability (Mukhtar ). In addition, genes involved in plant response and resistance were detected, including the aquaporin (AQP) gene (Csa3G345890) which plays key roles in drought, flooding, nutrient availability, response to temperature and light conditions (Kaldenhoff and Fischer 2006), and a disease resistance response protein gene (Csa4G280640) in dirigent family with a selective synonymous variant. Moreover, we found some genes related to biosynthesis processes, such as the UDP-glycosyltransferase 1 (UGT1) gene (Csa3G608710) with a selective nonsynonymous variant, disrupting the biosynthesis of littorine and its tropane alkaloids derivatives (hyoscyamine and scopolamine) (Qiu ). These findings hinted that there was existing selection of reproduction and disease resistance, and local adaption for light and temperature in the East Asian population.

Unique selective signatures in the Eurasian population

In the Eurasian population, a total of 226 specifically selective sites were located in 60 genes, and 1,921 sites were located in intergenic regions (Supplementary Table 8). In the 1,921 intergenic sites, 743 sites located in potential promoter regions. We further found that the genes were involved in regulation of fruit length, development, response to environment, and DNA repair (Supplementary Table 9). We found a gene Csa6G177440 located in the short-fruit 1 locus, which significantly decreased fruit length (Wang ), and a gene, E3 ubiquitin-protein ligase MARCH2 (Csa3G186720), that regulates plant development, including root development, organ size decisions, flowering time, and plant–environment interactions, such as drought and ABA (Shu and Yang 2017). Moreover, genes involved in the response to the environment were observed. For instance, the sesquiterpene synthase (TST) gene (Csa3G021130) defends against herbivores by their volatile terpenes (Liu ), and the dehydration responsive element-binding transcription factor 1B (DREB) gene (Csa3G180260) enhances tolerance to freezing temperatures, drought, and high salinity (Agarwal ). Furthermore, gene Csa3G912890 with 2 selective nonsynonymous variants and gene Csa5G424880 with a selective nonsynonymous variant were related to DNA repair. These findings suggested that the selective genes in the Eurasian population were related to the shorter fruit length, response to heat, drought and salt stress, pathogens and herbivore resistance.

Unique selective signatures in the Xishuangbanna

In the Xishuangbanna population, we found 812 specifically selective sites located in 120 genes and 2,960 selective sites in the intergenic region (Supplementary Table 10). In the 2,960 intergenic sites, 960 sites located in potential promoter regions. The genes were involved in resisting stress and pathogens, regulating developmental processes, biosynthesis, and flesh thickness (Supplementary Table 11). Genes related to resistance stress and pathogens were under selection. For example, the thaumatin-like protein (TLP) gene (Csa3G003990), containing a selective synonymous variant located in the Thaumatin domain, enhances drought and salt stress tolerance (de Jesus-Pires ). The ABC transporter G family member (ABCG) gene (Csa6G434390) with 5 synonymous variants, of which 2 localized at ABC2 membrane family, is involved in ABA tolerance and plant resistance against Pseudomonas syringae pv. tomato DC3000 (Pst DC3000) in Arabidopsis (Ji ). Genes involved in both stress resistance and regulating developmental processes were detected. For example, the WRKY6 gene (Csa7G328830) with a selective nonsynonymous variant and a synonymous variant was detected serving functions in plant senescence processes, pathogen defence mechanisms, abiotic stress responses, and regulation of the accumulation of fatty acids during seed development (Song ). In addition, genes regulating developmental processes were found, such as the mandelate racemase/muconate lactonizing protein (DXS1) gene (Csa2G404880) with a selective nonsynonymous variant located in the MR_MLE_C domain, which has been proposed serving fruit carotenoid biosynthesis and is required for the development and survival of tomato plants (Garcia-Alcazar ), the DNA topoisomerase 1 (TOP1) gene (Csa3G063680), which modulates auxin-regulated root development in rice (Shafiq ), and Csa2G058670 gene controlling flesh thickness (Xu ). These results implied that local adaptation signatures in the Xishuangbanna population might be related to salt, temperature and bacterial pathogen resistance, root development, carotenoid biosynthesis, and thin fruit flesh.

Soft selective signatures common in cultivated cucumbers

A selective sweep arises if a beneficial allele occurs at some recent time in the past and goes to fixation (Stephan 2019). This classic selective sweep model was defined as “hard sweep” formed by the de novo mutation (Hermisson and Pennings 2005). In contrast, multiple adaptive alleles sweep through the population at the same time was referred to as the “soft sweep” (Messer and Petrov 2013), which can help population rapidly respond to novel environmental challenges, rather than wait for a novel mutation to arise (Schrider and Kern 2017). Research has observed that soft sweeps are the dominant mode of adaptation in maize (Beissinger ). Recent studies have reported that local adaptation and domestication indeed show signatures of such soft selective sweeps (Zhong ; Stetter ; Wu ). Therefore, we further wanted to investigate whether soft sweeps contribute to adaptations in cucumber. We compared the signatures of hard sweeps with those of soft sweeps in the cultivated populations. Based on the Bayes factor (Garud ), selective sites were divided into 3 groups: hard sweeps, soft sweeps and unknowns. We found that 61.03%, 42.43%, and 84.23% of sites are soft sweeps in the East Asian, Eurasian, and Xishuangbanna populations, respectively, which were significantly greater than those of hard sweeps except for Eurasian (Fig. 3). These results indicated that soft sweeps are the dominant mode of adaptation in the East Asian and Xishuangbanna.

Fig. 3.

Proportion of selective sweeps in cultivated populations. In each bar, the count and proportion of specifically selective sites for each sweep corresponding to the population are indicated.

Proportion of selective sweeps in cultivated populations. In each bar, the count and proportion of specifically selective sites for each sweep corresponding to the population are indicated. The genes containing hard or soft selection sweep signatures (Fig. 4) were involved in developmental regulation, biotic, and abiotic stress resistance (Table 4). However, in the certain population, there were some differences. For example, for hard sweeps, signatures in East Asian were not related to biotic stress resistance and signatures in Xishuangbanna were not related to abiotic stress resistance; for soft sweeps, selective sites in Eurasian were not involved in abiotic stress resistance.

Fig. 4.

Genes harboring soft and hard selective sites in cultivated populations. Tracks indicate the following: genes containing only hard selective sites in the East Asian (green), Eurasian (orange), and Xishuangbanna (blue) populations (A); hard selective variants in the East Asian population (B); soft selective variants in the East Asian population (C); hard selective variants in the Eurasian population (D); soft selective variants in the Eurasian population (E); hard selective variants in the Xishuangbanna population (F); soft selective variants in the Xishuangbanna population (G); and genes containing only soft selective sites in East Asian (green), Eurasian (orange), and Xishuangbanna (blue) populations (H). In the B–F tracks, dark triangle dots indicate selective sites located in gene regions, and light circle dots indicate selective sites located in intergenic regions. The plot was generated using Circos v0.69-9 (Krzywinski ).

Table 4.

Functions of genes capturing hard sweep and soft sweep signatures in the cultivated populations.

Population	Developmental regulation		Biotic stress resistance		Abiotic stress resistance
Population	Hard sweeps	Soft sweeps	Hard sweeps	Soft sweeps	Hard sweeps	Soft sweeps
East Asian	+	+	−	+	+	+
Eurasian	+	+	+	+	+	−
Xishuangbanna	+	+	+	+	−	+

“+” was used to represent genes containing the sweep signatures involved in the corresponding function in the population, and “−” was used for the reverse.

Discussion

In this study, we performed a comprehensive genome scan of genetic variation in cultivated cucumbers worldwide, attempting to find signals of population-specific selection in each cultivated population. Therefore, 4 haplotype-based approaches, iHS, nSL, XP-EHH, and Rsb, were chosen, and Fisher’s method was used to combine separate P values of the 4 methods to increase the power to pinpoint the selected sites (Alexandra ). We finally identified 360 variants distributed in 22 genes in the East Asian populations, 2,248 variants located in 60 genes in the Eurasian population, and 3,867 variants in 120 genes in the Xishuangbanna population. We further surveyed uniquely selective sites in a certain population. We observed that genes involved in population-specific adaptation were related to the response to abiotic and biotic stress, developmental processes, and traits of fruit in the 3 cultivated populations. In addition, soft sweeps were found common in adaptation, and hard sweep and soft sweep signatures pointed to similar adaptions. We combined the P values of the 4 methods to reduce false positivity and bias in a certain method (Grossman ). Referring to previous research results, the origins of domesticated cucumber were traced back to more than 2,500 years (Qi ; Chomicki ). Based on the knowledge of haplotype-based calculator fitting for the identification of completed or ongoing selection in genomes (Szpiech and Hernandez 2014), EHH-based methods were chosen. When the beneficial allele frequency was low (0.1–0.3), which represented an ongoing selection, iHS and nSL performed well (Fan ). XP-EHH and Rsb were powerful for detecting the selection of high-frequency or fixed alleles in one population but not in another population (Sabeti ; Tang ). In addition, the iHS and the XP-EHH statistics were complementary to each other and the combination could increase the power of pinpointing the selected sites (Vitti ). Compared with a previous study (Qi ), only 2, 9, and 7 domestication sweep regions contained the selective sites identified by our analysis in the East Asian, Eurasian, and Xishuangbanna populations, respectively. The different results may be caused by different computing strategies and methods. In the previous study, they combined the East Asian, Eurasian, and Xishuangbanna populations into a single cultivated population. Then, methods based on site frequency spectrum were used. Regions with significantly lower diversity in cultivated cucumbers compared to wild cucumbers (the top 5% of π/π values) were detected. Meanwhile, regions with the top 5% of XP-CLR values were identified. These shared regions between the analysis of genetic diversity and XP-CLR analysis were considered to be selective sweeps. In our analysis, we scanned the whole genome for each cultivated population by performing 4 haplotype-based approaches, including iHS, nSL, XP-EHH, and Rsb. Finally, we used Fisher’s combination method to combine the P values of the 4 methods, which can greatly increase the power to pinpoint the selected regions (Zaykin ). In addition, the selective signatures persist for only ∼0.01 Ne generations in the haplotype structure, but for an order of magnitude longer time in the site frequency spectrum (Booker ); thus, we were more likely to detect relatively recent signatures compared with the previous study. We identified selective signatures in cucumber cultivated worldwide and attempted to provide the insight into the footprints of natural selection and domestication. Most genes involved in local adaptation signatures were observed to related to the response of the environment. For example, population-specific, selective signatures were involved in light, temperature resistance, and reproduction in the East Asian population, in light, heat stress, pathogen and herbivore resistance, and DNA repair in the Eurasian population, and in various microbial pathogens, salt and drought resistance in the Xishuangbanna population. These functions embodied the local adaptation to latitudes, daylight, soil conditions, and abiotic and biotic environments. In addition, the selective footprints associated with domestication were observed. For instance, selective gene related to shorter fruit length was detected in Eurasian; gene controlling flesh thickness was observed in the Xishuangbanna population. With the improvement of the genome functional annotation, we will completely understand the selection during cucumber domestication. Selective sites were divided into soft or hard sweeps based on Bayes factors proposed by Garud . In East Asian and Xishuangbanna, soft sweeps were common, consistent with previous studies in maize (Beissinger ). On the contrary, hard sweeps were common in Eurasian. Cucumber is monoecious crop with separate male and female flowers on the same plant. It is highly dependent on insect-mediated pollination but few dependent on wind and self-pollination is inefficient (Lowenstein ). These characteristics of pollination system increase genetic diversity (Fattorini and Glover 2020) and provide standing variants for soft sweeps. The population size at bottleneck in the Eurasian population was larger than those in East Asian and Xishuangbanna (Qi ), implying that standing variation enabled the Eurasian population to rapidly adapt the environment (Chaturvedi ); however, duration of bottleneck in Eurasian was longer than those in East Asian and Xishuangbanna (Qi ). These hinted that standing variation might contribute little to adaption of the environment and the Eurasian population recovery (Orr and Unckless 2014; Schrider and Kern 2017). Therefore, the proportion of soft sweeps in the Eurasian population was lower than those of other 2 cultivated populations. Genes covering hard and soft sweep signatures were involved in pathogen resistance, response to the environment and developmental regulation. In conclusion, we found population-specific, recent selective signatures in the 3 cultivated populations. These selective sites are involved in development, response, resistance, and biosynthesis. We further observed that soft sweeps were common for cultivated populations in East Asian and Xishuangbanna, but the opposite pattern was observed in Eurasian, where hard seeps were common. Our methods provide guidance for the analysis of population-specific positive selection. Our results provide a comprehensive dataset of population-specific, selective signatures in cultivated cucumber. Importantly, these findings will help to further improve the experimental understanding of the biological mechanisms of local adaptation and domestication.

Data availability

Supplementary Figs. 1–9 and Supplementary Tables 1–11 are available at figshare https://doi.org/10.6084/m9.figshare.19322318. Code used to combine P values using Fisher's combination can be found at https://github.com/CMB-BNU/fisher_combination. Supplementary Fig. 1: Comparison of our work and previous (Qi ) in the statistics calculation strategy between populations. Supplementary Fig. 2: Phylogenetic tree of 7 closely related species in Cucurbitaceae. Supplementary Fig. 3: Decay of linkage disequilibrium (LD) in 4 populations. Supplementary Fig. 4: Manhattan plots of P values of iHS (A), nSL (B), XP-EHH (C), and Rsb (D) in East Asian. Supplementary Fig. 5: Manhattan plots of P values of iHS (A), nSL (B), XP-EHH (C), and Rsb (D) in Eurasian. Supplementary Fig. 6: Manhattan plots of P values of iHS (A), nSL (B), XP-EHH (C) and Rsb (D) in Xishuangbanna. Supplementary Fig. 7: Venn diagram for selective sites in East Asian. Supplementary Fig. 8: Venn diagram for selective sites in Eurasian. Supplementary Fig. 9: Venn diagram for selective sites in Xishuangbanna. Supplementary Table 1: Individuals and SNPs kept after quality filtering. Supplementary Table 2: Decay of linkage disequilibrium (LD), measured by r2 values, in 4 populations. Supplementary Table 3: The parameters in BEAGLE5.0 used to calculate haplotypes in each population. Supplementary Table 4: Signatures showing selection detected by the 4 methods in the cultivated populations. Supplementary Table 5: P values of the tests for equality of the proportion of selective signatures detected by each 2 methods in each cultivated population. Supplementary Table 6: States of selective SNPs only detected in the East Asian population. Supplementary Table 7: Functions of genes containing selective SNPs in the East Asian population. Supplementary Table 8: States of selective SNPs only detected in the Eurasian population. Supplementary Table 9: Functions of genes containing selective SNPs in the Eurasian population. Supplementary Table 10: States of selective SNPs only detected in the Xishuangbanna population. Supplementary Table 11: Functions of genes containing selective SNPs in the Xishuangbanna population.

74 in total

1. Signatures of soft sweeps across the Dt1 locus underlying determinate growth habit in soya bean [Glycine max (L.) Merr.].

Authors: Limei Zhong; Qiaomei Yang; Xin Yan; Chao Yu; Liu Su; Xifeng Zhang; Youlin Zhu
Journal: Mol Ecol Date: 2017-08-02 Impact factor: 6.185

Review 2. E3 Ubiquitin Ligases: Ubiquitous Actors in Plant Development and Abiotic Stress Responses.

Authors: Kai Shu; Wenyu Yang
Journal: Plant Cell Physiol Date: 2017-09-01 Impact factor: 4.927

3. Response of wild bees (Hymenoptera: Apoidea: Anthophila) to surrounding land cover in Wisconsin pickling cucumber.

Authors: D M Lowenstein; A S Huseth; R L Groves
Journal: Environ Entomol Date: 2012-06 Impact factor: 2.377

4. Genomic footprints of sorghum domestication and breeding selection for multiple end uses.

Authors: Xiaoyuan Wu; Yuanming Liu; Hong Luo; Li Shang; Chuanyuan Leng; Zhiquan Liu; Zhigang Li; Xiaochun Lu; Hongwei Cai; Huaiqing Hao; Hai-Chun Jing
Journal: Mol Plant Date: 2022-01-05 Impact factor: 13.164

5. Genome-wide analysis on the maize genome reveals weak selection on synonymous mutations.

Authors: Duan Chu; Lai Wei
Journal: BMC Genomics Date: 2020-04-29 Impact factor: 3.969

6. On detecting incomplete soft or hard selective sweeps using haplotype structure.

Authors: Anna Ferrer-Admetlla; Mason Liang; Thorfinn Korneliussen; Rasmus Nielsen
Journal: Mol Biol Evol Date: 2014-02-18 Impact factor: 16.240

7. QTL mapping of cucumber fruit flesh thickness by SLAF-seq.

Authors: Xuewen Xu; Lu Lu; Biyun Zhu; Qiang Xu; Xiaohua Qi; Xuehao Chen
Journal: Sci Rep Date: 2015-10-28 Impact factor: 4.379

8. The Pfam protein families database in 2019.

Authors: Sara El-Gebali; Jaina Mistry; Alex Bateman; Sean R Eddy; Aurélien Luciani; Simon C Potter; Matloob Qureshi; Lorna J Richardson; Gustavo A Salazar; Alfredo Smart; Erik L L Sonnhammer; Layla Hirsh; Lisanna Paladin; Damiano Piovesan; Silvio C E Tosatto; Robert D Finn
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

9. Haplotype-resolved genome assembly provides insights into evolutionary history of the tea plant Camellia sinensis.

Authors: Xingtan Zhang; Shuai Chen; Longqing Shi; Daping Gong; Shengcheng Zhang; Qian Zhao; Dongliang Zhan; Liette Vasseur; Yibin Wang; Jiaxin Yu; Zhenyang Liao; Xindan Xu; Rui Qi; Wenling Wang; Yunran Ma; Pengjie Wang; Naixing Ye; Dongna Ma; Yan Shi; Haifeng Wang; Xiaokai Ma; Xiangrui Kong; Jing Lin; Liufeng Wei; Yaying Ma; Ruoyu Li; Guiping Hu; Haifang He; Lin Zhang; Ray Ming; Gang Wang; Haibao Tang; Minsheng You
Journal: Nat Genet Date: 2021-07-15 Impact factor: 38.330

10. Open chromatin in grapevine marks candidate CREs and with other chromatin features correlates with gene expression.

Authors: Rachel Schwope; Gabriele Magris; Mara Miculan; Eleonora Paparelli; Mirko Celii; Aldo Tocci; Fabio Marroni; Alice Fornasiero; Emanuele De Paoli; Michele Morgante
Journal: Plant J Date: 2021-08-06 Impact factor: 6.417