A high content of seed glucosinolates and their degradation products imposes anti-nutritional effects on livestock; therefore, persistent efforts are made to reduce the seed GSL content to increase the commercial value of rapeseed meal. Here, we dissected the genetic structure of SGC by genome-wide association studies (GWAS) combined with transcriptome-wide association studies (TWAS). Fifteen reliable quantitative trait loci (QTLs) were identified to be associated with the reduced SGC in modern B. napus cultivars by GWAS. Analysis of the selection strength and haplotypes at these QTLs revealed that low SGC was predominantly generated by the co-selection of qGSL.A02.2, qGSL.C02.1, qGSL.A09.2, and qGSL.C09.1. Integration of the results from TWAS, comprehensive bioinformatics, and POCKET algorithm analyses indicated that BnaC02.GTR2 (BnaC02g42260D) is a candidate gene underlying qGSL.C02.1. Using CRISPR/Cas9-derived Bna.gtr2s knockout mutants, we experimentally verified that both BnaC02.GTR2 and its three paralogs positively regulate seed GSL accumulation but negatively regulated vegetative tissue GSL contents. In addition, we observed smaller seeds with higher seed oil content in these Bna.gtr2 mutants. Furthermore, both RNA-seq and correlation analyses suggested that Bna.GTR2s might play a comprehensive role in seed development, such as amino acid accumulation, GSL synthesis, sugar assimilation, and oil accumulation. This study unravels the breeding selection history of low-SGC improvement and provides new insights into the molecular function of Bna.GTR2s in both seed GSL accumulation and seed development in B. napus.
A high content of seed glucosinolates and their degradation products imposes anti-nutritional effects on livestock; therefore, persistent efforts are made to reduce the seed GSL content to increase the commercial value of rapeseed meal. Here, we dissected the genetic structure of SGC by genome-wide association studies (GWAS) combined with transcriptome-wide association studies (TWAS). Fifteen reliable quantitative trait loci (QTLs) were identified to be associated with the reduced SGC in modern B. napus cultivars by GWAS. Analysis of the selection strength and haplotypes at these QTLs revealed that low SGC was predominantly generated by the co-selection of qGSL.A02.2, qGSL.C02.1, qGSL.A09.2, and qGSL.C09.1. Integration of the results from TWAS, comprehensive bioinformatics, and POCKET algorithm analyses indicated that BnaC02.GTR2 (BnaC02g42260D) is a candidate gene underlying qGSL.C02.1. Using CRISPR/Cas9-derived Bna.gtr2s knockout mutants, we experimentally verified that both BnaC02.GTR2 and its three paralogs positively regulate seed GSL accumulation but negatively regulated vegetative tissue GSL contents. In addition, we observed smaller seeds with higher seed oil content in these Bna.gtr2 mutants. Furthermore, both RNA-seq and correlation analyses suggested that Bna.GTR2s might play a comprehensive role in seed development, such as amino acid accumulation, GSL synthesis, sugar assimilation, and oil accumulation. This study unravels the breeding selection history of low-SGC improvement and provides new insights into the molecular function of Bna.GTR2s in both seed GSL accumulation and seed development in B. napus.
Canola‐type rapeseed (Brassica napus, AACC, 2n = 38) is currently one of the main oil crops and a promising alternative to soybean source protein feed in the world (Chao et al., 2017; Gatrell et al., 2014; Wang et al., 2018). However, earlier unimproved rapeseed cultivars contain high seed glucosinolate (GSL) concentrations, which reduce the nutritional value and feed quality of the seed (Schmidt and Bancroft, 2011). ‘Bronowski,’ a feed landrace used in Poland in the late 1960s, provided the only germplasm source for breeding seed GSL content (SGC) in B. napus (Agnihotri et al., 2007). Subsequently, with the discovery of low‐erucic cultivar ‘Liho’, the ‘double‐low’ (low seed erucic acid and GSL content) B. napus varieties have been successfully developed and quickly popularized in major B. napus growing countries since the 1990s (Agnihotri et al., 2007; Burton et al., 2003a,b; Gupta and Pratap, 2007). However, glucosinolates (GSLs) are important defence compounds for resistance to herbivores and pathogens in plants (Wittstock et al., 2003). Reportedly, the efforts made to reduce the GSL contents in seeds have concurrently reduced the GSL contents in leaves and other tissues, resulting in an overall decline in disease and stress resistance in B. napus (Li et al., 1999; Mithen, 1992). Therefore, tissue‐specific control of GSL accumulation is of great significance for the quality improvement of B. napus (Nour‐Eldin et al., 2012).The GSL biosynthesis pathway has been widely studied in the model plant Arabidopsis thaliana and has evolved to produce more than 130 GSL structures with multiple amino acid‐derived side chains (Agerbirk and Olsen, 2012; Fahey et al., 2001). Based on different precursor amino acids for side chains, GSL can be classified as aliphatic (methionine, alanine, valine, leucine, or isoleucine), aromatic (tyrosine or phenylalanine), and indolic (tryptophan) GSL (Blazevic et al., 2020; Clarke, 2010). GSL biosynthesis can generally be divided into three stages: extension of the side chain, formation of the core structure, and modification of the R‐side chain (Barco and Clay, 2019; Fahey et al., 2001). Many gene families including BCATs (Lachler et al., 2015; Schuster et al., 2006), MAMs (Redovnikovic et al., 2012; Textor et al., 2007), CYP79 family (Hansen et al., 2001; Reintanz et al., 2001; Tantikanjana et al., 2004; Zang et al., 2008), CYP83 family (Bak and Feyereisen, 2001; Hemm et al., 2003; Zang et al., 2008) and AOPs (Hall et al., 2001; Kliebenstein et al., 2001b) participate in these processes. Several transcription factors have also been identified as important harbours in GSL biosynthesis. The R2R3‐MYB transcription factors play an exclusive role in aliphatic and indole GSL biosynthesis (Burow et al., 2010; Gigolashvili et al., 2007). Disruption of HAG1/MYB28 and HAG3/MYB29 genes resulted in extremely low aliphatic GSL levels (Jules et al., 2008; Sonderby et al., 2007). Knockout and overexpression of ATR1/MYB34, HIG1/MYB51, and HIG2/MYB122 genes caused significant changes in the indolic GSL content (Celenza et al., 2005; Frerigmann and Gigolashvili, 2014). Moreover, MYC2/MYC3/MYC4 transcription factors have been reported to indirectly modulate GSL synthesis through the regulation of R2R3‐MYB transcription factors (Gigolashvili et al., 2009; Schweizer et al., 2013; Zimmermann et al., 2010).GSLs are mainly synthesized in nutrient‐rich tissues, such as rosette leaves and silique walls, and then relocated to seed embryos by transporters as a non‐uniform distribution pattern through the phloem (Chen et al., 2001; Du and Halkier, 1998; Petersen et al., 2002). Recently, in Arabidopsis, two aliphatic GSL‐specific bidirectional transporters, namely, GTR1 (also known as NPF2.10) and GTR2 (also known as NPF2.11) have been identified (Andersen et al., 2013; Nour‐Eldin et al., 2012), and their role in long‐distance transport of aliphatic GSL has been demonstrated in gtr1gtr2 double mutants—in the mutants, the accumulation of aliphatic GSL in root and leaf margins was strongly repressed (Madsen et al., 2014), and GSL‐free seeds were successfully obtained (Nour‐Eldin et al., 2012). Additionally, another GSL‐specific transporter, GTR3 (also known as NPF2.9), together with GTR1 and GTR2, has been demonstrated to import indole GSL into storage cells (Jorgensen et al., 2017). However, most GSLs in Arabidopsis seeds are aliphatic GSLs, and indolic GSLs account for about 2% (Brown et al., 2003). In B. napus, multiple orthologs of GTR1 and GTR2 have been identified because of their highly duplicated genome status (Chalhoub et al., 2014; Song et al., 2020). In Brassica rapa, seven GTR homologs have been identified; however, the ethyl methanesulfonate‐based mutation of BrGTR2A2 only has been shown to reduce the average SGC by 67 ± 5%. Similarly, mutation of four of the twelve GTR homologs in Brassica juncea has reduced the average SGC in the quadruple mutant (2A1/2A2/2B1/2B2) by 62 ± 10% (Nour‐Eldin et al., 2017). In B. napus, eight Bna.GTR2 genes have been identified and annotated, of which three genes were silent in the seeds of double‐low varieties and the remaining five genes were moderately expressed (Lu et al., 2019); however, associative transcriptome analysis showed that only the expression level of BnaA02.GTR2 in young leaves was related to the accumulation of GSL in seeds (Lu et al., 2014). These findings suggest that the different GTR2 paralogs seem to possess a conserved function in GSL transportation; however, their biological functions remain to be carefully defined in B. napus. Therefore, to achieve the goal of tissue‐specific control of GSL accumulation, it is necessary to understand the source–sink relationship of GSLs and evaluate the availability of GSL transporters, especially GTR2.The development and application of genomic tools and techniques, such as QTL mapping (Brondani et al., 2002), genome‐wide association study (GWAS; Lijun et al., 2019), and transcriptome‐wide association study (TWAS; Tang et al., 2020), have contributed to a better understanding of gene regulatory networks and the genetic mechanisms underlying complex quantitative traits in crops (Yadava et al., 2013). In B. napus, GWAS or linkage association mapping methods have been used to identify relevant QTLs that regulate GSL biosynthesis and accumulation (Feng et al., 2012; Liu et al., 2020; Lu et al., 2014). Two to three major QTLs were found to control the SGC of several bi‐parental mapping populations (Howell et al., 2003; Toroser et al., 1995; Zhao and Meng, 2003). Subsequently, Feng et al. (2012) identified 71 seed‐specific metabolite QTLs for GSL compounds. Furthermore, 43 QTLs for SGC were mapped over seven environments in Germany and China in a doubled haploid population in B. napus (Fu et al., 2015). With the rapid development of the new biotechnology, especially GWAS and TWAS, a considerable progress in the exploration of seed GSL accumulation in B. napus has been made in recent years. Through associative transcriptomics, Lu et al. (2014) identified 10 SNPs closely linked to genes related to SGC. Furthermore, Wang et al. (2018) identified 49 loci related to SGC using high‐throughput resequencing. Based on these findings, we hypothesize that combining genomic and transcriptomic approaches could unravel the genetic basis of SGC and provide insights into the application of transport engineering in B. napus breeding.Therefore, this study aims to comprehensively dissect the genetic basis of SGC at both the genomic and transcriptomic levels in B. napus by GWAS, TWAS, and co‐expression network analysis. We identified 15 QTLs, 1854 genes, and regulatory networks associated with SGC. In addition, BnaC02.GTR2 was found to be a key gene influencing seed GSL accumulation as well as multiple seed traits.
Results
GWAS deciphered QTLs associated with SGC in B. napus
To determine the QTLs associated with SGC of B. napus, SGC of 505 B. napus accessions (Tang et al., 2020) was examined in Wuhan (WH) for 3 years (2017–2019). The Pearson coefficients of determination (R
2) of seed GSL phenotype data estimated using the best linear unbiased prediction (BLUP) method were stable in WH 2017–2019 (0.87–0.91) with improved R
2 between the years (0.94–0.98; Figure S1).We performed GWAS for SGC using a linear mixed model (LMM) in the genome‐wide efficient mixed‐model association (GEMMA) software (Buss et al., 2016; Figure S2). Significant association signals were detected using the threshold of 6.91 × 10−7 of BLUP value in the data of WH 2017–2019 (Dataset S1). Finally, 15 QTLs mainly located on chromosomes A02, A09, C02, and C09 were identified as stable QTLs, detected at least twice in 3 years (Datasets S2 and S1, Figure 1A). Of these QTLs, qGSL.A02, qGSL.A09, qGSL.C02, qGSL.C07, and qGSL.C09 were reported in previous studies (Feng et al., 2012; Harper et al., 2012; Liu et al., 2020). Approximately, in the 150 kb region covered by the 15 QTLs, we identified 669 genes as GWAS‐associated candidate genes (Dataset S3).
Figure 1
Genome‐ and transcriptome‐wide association studies of SGC. (a) Manhattan plots of GWAS for SGC. The BLUP values of SGC for 3 years (2017–2019) were used in GWAS. QTLs identified at least twice were marked in the plots. (b) Manhattan plot of TWAS results (FDR < 0.05) at 20 DAF for SGC. (c) Manhattan plot of TWAS results (FDR < 0.05) at 40 DAF for SGC. In (b) and (c), each dot represents a gene that was tested, and the golden dots indicate genes that are significantly associated with SGC. The position of the genome is plotted on the X‐axis and the log‐transformed FDR values of association between gene expression and SGC are plotted on the Y‐axis. The genes above the bold black line are positively associated with SGC, and the genes shown below are negatively associated. The dashed grey horizontal lines represent the significance level. (d) Venn diagram of overlapped genes significant in TWAS at 20 and 40 DAF. (e) GO enrichment analysis of significant genes identified by TWAS at 20 DAF. The count number represents the number of genes enriched in a process, and the colour represents the range of P‐value.
Genome‐ and transcriptome‐wide association studies of SGC. (a) Manhattan plots of GWAS for SGC. The BLUP values of SGC for 3 years (2017–2019) were used in GWAS. QTLs identified at least twice were marked in the plots. (b) Manhattan plot of TWAS results (FDR < 0.05) at 20 DAF for SGC. (c) Manhattan plot of TWAS results (FDR < 0.05) at 40 DAF for SGC. In (b) and (c), each dot represents a gene that was tested, and the golden dots indicate genes that are significantly associated with SGC. The position of the genome is plotted on the X‐axis and the log‐transformed FDR values of association between gene expression and SGC are plotted on the Y‐axis. The genes above the bold black line are positively associated with SGC, and the genes shown below are negatively associated. The dashed grey horizontal lines represent the significance level. (d) Venn diagram of overlapped genes significant in TWAS at 20 and 40 DAF. (e) GO enrichment analysis of significant genes identified by TWAS at 20 DAF. The count number represents the number of genes enriched in a process, and the colour represents the range of P‐value.
TWAS unravelled the molecular basis of seed GSL accumulation
Twenty days after flowering (DAF) and 40 DAF are critical developmental stages for seed filling and embryo growth, with the accumulation of GSL in B. napus (Gijzen et al., 1989; Larden and Triboi‐Blondel, 1994; Murphy and Cummins, 1989; Yu et al., 2010). To further explore the genetic basis of seed GSL accumulation, we performed TWAS using the seed transcriptome data of two populations comprising 309 accessions at 20 DAF and 274 accessions at 40 DAF (Tang et al., 2020). We found 851 and 1327 genes significantly associated with SGC at 20 and 40 DAF (false discovery rate, FDR < 0.05; Figure 1b,c, Datasets S4 and S5), which contained 44 and 8 known GSL‐associated genes, respectively (Datasets S6 and S7). Among the significantly associated genes, 324 genes were identified simultaneously in both stages (Figure 1D). Moreover, many GSL‐associated genes identified here were underlying in several major QTLs, such as qGSL.A09.2, qGSL.C02.1, and qGSL.C09.1 (Figure 1b,c).We further performed gene ontology (GO) enrichment analysis of genes significant in TWAS (FDR < 0.05) at 20 DAF. The results showed that many GSL‐associated genes participated in the GSL biosynthesis process, phosphorylation, and hydrogen sulphide biosynthetic process (Figure 1e; Dataset S8). Genes encoding transporters, such as GTR2 (AT5G62680) and BAT5 (AT4G12030), transcription factors including HAG1/MYB28 (AT5G61420) and MYC3 (AT5G46760), as well as cytochrome P450 family members, CYP83A1 (AT4G13770) and CYP79F1 (AT1G16410), were positively associated with SGC at 20 DAF (Dataset S6). However, a few genes such as PEX14 (AT5G62810), TWD 40‐2 (AT5G24710), and NADP (AT5G62420) were negatively associated with SGC at 40 DAF (Dataset S7), and these GSL‐associated genes at this stage were mainly involved in several biological processes such as starch catabolism, arginine catabolism and regulation of DNA‐templated transcription (Figures S3A,B, Datasets S9 and S10). Notably, both GSL synthesis and GSL transport‐related genes, HAG1/MYB28, BAT5, and GTR2 were identified to be positively associated with SGC at 20 and 40 DAF (Datasets S6 and S7). These results indicate that many genes involved in multiple biological processes affect the accumulation of GSL during seed development in B. napus.
BnaC02.GTR2 is likely a key gene for seed GSL accumulation in B. napus
Genes with the same or similar functions can usually be divided into modules based on their expression patterns, and analysis of the relationship between modules and traits could help discover key genes controlling the trait (Ma et al., 2013). Analysis of the relationship between modules and trait can help discover the key genes controlling the trait. We used weighted gene co‐expression network analysis (WGCNA) to identify highly synergistic sets of genes for independent modules from GWAS and TWAS results (DiLeo et al., 2011; Langfelder and Horvath, 2008). It has been shown that scale‐free networks can make gene sets more to identify biologically more meaningful gene sets (Hein et al., 2006). Therefore, we estimated the best soft‐thresholding power (β) value for constructing the scale‐free network at 20 and 40 DAF. The β that allowed scale independence to exceed 0.85 for the first time was used as a threshold to classify the gene modules (Figure S4). At 20 and 40 DAF, 15 and 11 co‐expression modules (β20DAF = 14, β40DAF = 8) were divided for further analysis (Figure 2a,b). At 20 DAF, three modules were significantly positively correlated with SGC (red: r = 0.63, P = 1 × 10−35; brown: r = 0.54, P = 1 × 10−24; midnight blue: r = 0.34, P = 6 × 10−10; Dataset S11). Among them, the red module contained GTR2, PEX14, and other genes related to the transport of some other metabolites. The brown module contained genes related to GSL synthesis, such as BAT5 and IPMI2 (AT2G43100). At 40 DAF, two modules were significantly positively correlated with SGC (black: r = 0.69, P = 2 × 10−39; purple: r = 0.34, P = 5 × 10−9; Dataset S12). GTR2 and PEX14 are also present in the black module. Additionally, we found that the gene significance (GS) of these genes was very high in two stages (BnaC02.GTR2: GS20 DAF = 0.64, GS40 DAF = 0.55; BnaA09.PEX14 (BnaA09g06240D): GS20 DAF = 0.56, GS40 DAF = 0.56; Figure 2c, Datasets S13 and S14).
Figure 2
BnaC02.GTR2 is a key gene for seed SGC accumulation. (a) WGCNA analysis of genes/modules with SGC trait at 20 DAF. (b) WGCNA analysis of genes/modules with SGC trait at 40 DAF. In (a) and (b), Hexagon represents SGC, rhombuses represent modules, and circles represent the top 5 genes with the highest absolute GS values in a module. Shape size represents the absolute value of gene/module correlation with SGC. Orange indicates that the module/gene is positively correlated with SGC. Green indicates that the module/gene is negatively correlated with SGC. Grey indicates that the module/gene is not correlated with the trait. (c) Co‐localized gene functional score and phenotypic correlation analysis. Each dot represents a gene and the size of the dot is the value of 10/function score ranking. Different colours represent different QTLs. The GS values at 20 and 40 DAF are plotted on X‐ and Y‐axes, respectively. (d) Box plots for SGC based on the haplotypes of variations in gene region and upstream 2 kb region of BnaC02.GTR2. (e) Correlation between SGC and expression levels of BnaC02.GTR2 at 20 DAF. (f) Correlation between SGC and expression levels of BnaC02.GTR2 at 40 DAF. Each point in (e) and (f) represents an accession. The horizontal axis represents the scaled gene expression values, and the vertical axis represents the SGC corresponding to the accessions.
BnaC02.GTR2 is a key gene for seed SGC accumulation. (a) WGCNA analysis of genes/modules with SGC trait at 20 DAF. (b) WGCNA analysis of genes/modules with SGC trait at 40 DAF. In (a) and (b), Hexagon represents SGC, rhombuses represent modules, and circles represent the top 5 genes with the highest absolute GS values in a module. Shape size represents the absolute value of gene/module correlation with SGC. Orange indicates that the module/gene is positively correlated with SGC. Green indicates that the module/gene is negatively correlated with SGC. Grey indicates that the module/gene is not correlated with the trait. (c) Co‐localized gene functional score and phenotypic correlation analysis. Each dot represents a gene and the size of the dot is the value of 10/function score ranking. Different colours represent different QTLs. The GS values at 20 and 40 DAF are plotted on X‐ and Y‐axes, respectively. (d) Box plots for SGC based on the haplotypes of variations in gene region and upstream 2 kb region of BnaC02.GTR2. (e) Correlation between SGC and expression levels of BnaC02.GTR2 at 20 DAF. (f) Correlation between SGC and expression levels of BnaC02.GTR2 at 40 DAF. Each point in (e) and (f) represents an accession. The horizontal axis represents the scaled gene expression values, and the vertical axis represents the SGC corresponding to the accessions.To identify the key genes controlling the SGC of B. napus, we extracted 74 genes significant in TWAS localized in SGC‐related QTL intervals at two stages. Of these 74 genes, 39 genes, including the SGC‐related genes orthologous to GTR2, HAG1/MYB28, APO3 (AT5G61930), TAF5 (AT5G25150), PYRD (AT5G23300), and ORE14 (AT5G62000), were present in 15 stable QTLs (Figure 2c; Dataset S15). The POCKET algorithm, which integrates multi‐omics data and is based on machine learning, can prioritize candidate genes in QTL intervals (Tang et al., 2020). We used POCKET to analyse the functional ranking of co‐localized genes in qGSL.C02.1. BnaC02.GTR2 with the highest GS at 20 and 40 DAF was highly ranked in qGSL.C02.1 (Figure S5, Dataset S16). To study the effect of BnaC02.GTR2 on SGC accumulation, we analysed the relationship between variant genotypes and phenotypes of BnaC02.GTR2 in the population. The coding region and the upstream 2 kb promoter region of BnaC02.GTR2 were classified into four major haplotypes in 505 accessions, and a significant difference in SGC between the haplotypes (Figure 2d) was observed. The haplotype B possessed a significantly higher average SGC (94.40 μmol/g) than the other three haplotypes with a P‐value of 5.71 × 10−17, whereas no significant difference in average SGC among the remaining three haplotypes was observed. Therefore, we speculated that sequence variations in the BnaC02.GTR2 coding and promoter regions might be responsible for variation in SGC in B. napus. In addition, correlation analysis of population phenotype and gene expression showed that BnaC02.GTR2 positively regulated the SGC in B. napus ( = 0.40, P
20 DAF = 2.85 × 10−32; = 0.28, P
40 DAF = 1.44 × 10−21; Figures 2e,f). These results indicate that BnaC02.GTR2 could be the causal gene underlying qGSL.C02.1.
Selection analysis of QTLs associated with low SGC in B. napus breeding
To explore the QTLs that could be utilized in breeding programs to develop low‐SGC varieties, we divided the lead SNP of QTLs into ancient and derived haplotypes, of which the latter generated from the ancient alleles (Tang et al., 2020). We found that nine QTLs (qGSL.A02.1, qGSL.A02.2, qGSL.A08.1, qGSL.A09.1, qGSL.A09.2, qGSL.A09.4, qGSL.C02.1, qGSL.C09.1, and qGSL.C09.3) had specific haplotypes. Comparison of the nucleotide diversity (π) values of ancient and derived haplotypes revealed that six QTLs were strongly selected and four of these were located on chromosomes A09 and C09 (Figures 3a,c,e, and S6). Based on the structural delineation of 505 accessions in previous studies, semi‐winter population 1 (SW1) corresponded to low SGC and low‐erucic acid accessions and semi‐winter population 2 (SW2) corresponded to double‐high accessions (Tang et al., 2020). In different subpopulations, the derived alleles of qGSL.A09.2 and qGSL.C09.1 had major effects on SGC (Figures 3d,f, and S7A to I). Based on the results of QTL candidate gene prediction, BnaA09.PYRD and BnaC09.HAG1/MYB28 were predicted to be the candidate genes of qGSL.A09.2 and qGSL.C09.1, respectively (Figures S8 and S9).
Figure 3
Analysis of nucleotide diversity (π) of three SGC‐associated QTLs. π of the accession groups and the GWAS signal of the mixed linear model (MLM) in (a) qGSL.C02.1, (c) qGSL.C09.1, and (e) qGSL.A09.2. Boxplot for SGC between two types of haplotypes in (b) qGSL.C02.1, (d) qGSL.C09.1, and (f) qGSL.A09.2. (g) Venn diagram showing overlapped low‐SGC materials with low‐SGC haplotypes in qGSL.C02.1, qGSL.C09.1, and qGSL.A09.2. (h) Analysis of different QTL haplotypes in representative materials. The first row is SGC and the second row is material. ‘+’ indicates that the allele is present, ‘−’ indicates that the allele is absent. In (a), (c), and (e), the accession groups are divided by the genotype of the lead SNP in QTLs. The dots in the upper panel are sliding regions of π (20 kb window sliding in 10 kb steps) by the two accession groups, and the curves are lower smoothed dots. Colours in the GWAS signal indicate the LD (R2) between the lead SNP and each variant.
Analysis of nucleotide diversity (π) of three SGC‐associated QTLs. π of the accession groups and the GWAS signal of the mixed linear model (MLM) in (a) qGSL.C02.1, (c) qGSL.C09.1, and (e) qGSL.A09.2. Boxplot for SGC between two types of haplotypes in (b) qGSL.C02.1, (d) qGSL.C09.1, and (f) qGSL.A09.2. (g) Venn diagram showing overlapped low‐SGC materials with low‐SGC haplotypes in qGSL.C02.1, qGSL.C09.1, and qGSL.A09.2. (h) Analysis of different QTL haplotypes in representative materials. The first row is SGC and the second row is material. ‘+’ indicates that the allele is present, ‘−’ indicates that the allele is absent. In (a), (c), and (e), the accession groups are divided by the genotype of the lead SNP in QTLs. The dots in the upper panel are sliding regions of π (20 kb window sliding in 10 kb steps) by the two accession groups, and the curves are lower smoothed dots. Colours in the GWAS signal indicate the LD (R2) between the lead SNP and each variant.BnaC02.GTR2 and its paralog BnaA02.GTR2 (BnaA02g33530D) were present in qGSL.C02.1 and qGSL.A02.2, respectively. Interestingly, the ancient and derived haplotypes of the two QTLs had low and similar π values (Figures 3a and S6D). This suggests that GTR2 may have indispensable functions during seed development, making it highly conserved in B. napus. Both qGSL.C02.1 and qGSL.A02.2 had large effects on SGC in subpopulations (SW1 and SW2), and their low‐SGC haplotypes were mainly distributed in SW1 with low‐SGC phenotypes (Figures 3g and S7A,B). Interestingly, the seed oil content (SOC) and thousand seed weight (TSW) were not affected by these two QTLs based on haplotype analysis (qGSL.A02.2: p
SOC = 0.40, p
TSW = 0.42; qGSL.C02.1: p
SOC = 0.46, p
TSW = 0.08; Figure S10). Taken together, these results suggest that despite the conservation status of both BnaA02.GTR2 and BnaC02.GTR2, breeding selection strategies effectively utilized the natural variations that singularly affect SGC.Notably, the phenotypic distribution of haplotypes for the four QTLs (qGSL.A02.2, qGSL.C02.1, qGSL.A09.2, and qGSL.C09.1) were similar with the mean difference of SGC between the two haplotypes ranging from 52.60 to 67.47 μmol/g in the population (Figures 3b,d,f and S6D). To explore how the phenotypic distribution characteristics of these QTLs were shaped, we analysed the haplotypes of qGSL.A09.2, qGSL.C09.1, and qGSL.C02.1 that represent the effects of GTR2 in both qGSL.C02.1 and qGSL.A02.2 in the 154 low‐SGC accessions. These results showed that in 151 accessions, the low‐SGC haplotypes were identically associated with all three QTLs (Figure 3g). We then selected the top five accessions with extremely low SGC and extremely high SGC from the 505 accessions, respectively. Notably, the low‐SGC haplotypes in the four large‐effect QTLs (qGSL.A02.2, qGSL.C02.1, qGSL.A09.2, and qGSL.C09.1) were consistently present in the five low‐SGC accessions (Figure 3h). These results suggested that the low‐SGC accessions were mainly generated by co‐selection of multiple large‐effect QTLs, of which contained key GSL‐associated genes GTR2, PYRD, and HAG1/MYB28 (Figure 3a,c,e).
Disruption of Bna.GTR2s conferred low SGC but affected several seed traits in B. napus
The correlation coefficient analysis showed that six GTR2 homologs in B. napus could be divided into two groups. The first group, including BnaA06.GTR2 (BnaA06g22160D), BnaC03.GTR2 (BnaC03g51560D), BnaA02.GTR2 and BnaC02.GTR2 had a higher protein sequence similarity with GTR2 than the second group containing BnaA09.GTR2 (BnaA09g06190D) and BnaC09.GTR2 (BnaC09g05810D; Figures 4A and S11A). Therefore, to explore the potential functions of Bna.GTR2 homologs, we used the CRISPR/Cas9 genome editing system to edit the four homologous genes belonging to the first group (Figure S11A). We designed two sgRNAs (sgRNA1 and sgRNA2) that specifically target the major facilitator superfamily conserved domain (Wang et al., 2020) in exons 4 and 5 of all four Bna.GTR2 homologs, respectively (Figures S11B,C). In total, 29 T0‐positive transgenic plants were obtained and 10 homozygous T2 lines with mutations in the four homologous genes were identified by sequencing the target sites. The homozygous mutations caused amino acid substitutions, frameshifts or premature stop codons that resulted in truncated MFS conserved domains (Dataset S17). Further sequence analysis showed that the potential off‐target sites were not mutated (Datasets S17 and S19).
Figure 4
Functional characterization of BnaC02.GTR2 as a positive regulator of SGC in B. napus. (A) Protein sequence similarity analysis of GTR2 in B. napus and A. thaliana. (B) Boxplot of SGC (μmol/g) in WT and Bna.gtr2 mutants. (C) Boxplot of TSW (g) in WT and Bna.gtr2 mutants. (D) Boxplot of seed oil content (% of dry weight) in WT and Bna.gtr2 mutants. In (B)–(D), letters a, b, c, d, e, and f indicate significant differences by two‐way ANOVA followed by post hoc Tukey’s HSD test for all pairwise comparisons (P < 0.05); a02, c02, a06, and c03 represent mutants of BnaA02.gtr2, BnaC02.gtr2, BnaA06.gtr2, and BnaC03.gtr2, respectively; a2c2, a2c2c3, and a2c2a6c3 represent double, triple, and quadruple mutants of Bna.gtr2s. Each colour represents a material, and each dot represents a replicate. The middle bars represent the median, while the bottom and top of each box represent the 25th and 75th percentiles, respectively. (E) Phenotypes of 50 seeds of WT and mutants, Bar = 1 cm. Seed diameters (mm) are the mean ± SD of five biological replicates.
Functional characterization of BnaC02.GTR2 as a positive regulator of SGC in B. napus. (A) Protein sequence similarity analysis of GTR2 in B. napus and A. thaliana. (B) Boxplot of SGC (μmol/g) in WT and Bna.gtr2 mutants. (C) Boxplot of TSW (g) in WT and Bna.gtr2 mutants. (D) Boxplot of seed oil content (% of dry weight) in WT and Bna.gtr2 mutants. In (B)–(D), letters a, b, c, d, e, and f indicate significant differences by two‐way ANOVA followed by post hoc Tukey’s HSD test for all pairwise comparisons (P < 0.05); a02, c02, a06, and c03 represent mutants of BnaA02.gtr2, BnaC02.gtr2, BnaA06.gtr2, and BnaC03.gtr2, respectively; a2c2, a2c2c3, and a2c2a6c3 represent double, triple, and quadruple mutants of Bna.gtr2s. Each colour represents a material, and each dot represents a replicate. The middle bars represent the median, while the bottom and top of each box represent the 25th and 75th percentiles, respectively. (E) Phenotypes of 50 seeds of WT and mutants, Bar = 1 cm. Seed diameters (mm) are the mean ± SD of five biological replicates.Subsequently, we measured the SGC phenotype of different Bna.gtr2 homozygous mutants. Each of the four single‐locus Bna.gtr2 mutants showed significant decrease in SGC compared with the wild‐type (WT) recipient (cv.13CK‐3), with BnaA02.gtr2 (a02) and BnaC02.gtr2 (c02) showed the most significant decrease, whereas BnaC03.gtr2 (c03) and BnaA06.gtr2 (a06) showed a moderate decline. Moreover, the BnaA02.gtr2 and BnaC02.gtr2 double mutant (a2c2), as well as BnaA02.gtr2, BnaC02.gtr2 and BnaC03.gtr2 triple mutants (a2c2c3) also exhibited significant decreases in SGC compared with a02 or c02. Notably, the average SGC of the BnaA02.gtr2, BnaC02.gtr2, BnaA06.gtr2, and BnaC03.gtr2 quadruple mutant (a2c2a6c3) was 18.21 μmol/g, representing an 86.85% decrease compared with WT (Figure 4B; Dataset S20). These results revealed the crucial function of Bna.GTR2s in seed GSL accumulation. In addition, our findings showed that different Bna.GTR2 homologs studied here affected SGC to different degrees and a partially functional redundancy commonly existed among the four homologs.Surprisingly, the TSW of the a2c2a6c3 quadruple mutant was decreased by 64.35% compared with that of WT seeds, similar to the reduction degree of SGC (Figure 4C; Dataset S21). Additionally, seed diameter showed significant decrease in Bna.gtr2 mutants compared with WT seeds (Figures 4E and S12, Dataset S22). Previous studies have shown that the main storage compounds accumulated in seeds of B. napus consist of storage triacylglycerols and proteins (Schwender et al., 2006). To investigate whether Bna.GTR2s affect other seed traits of B. napus; we examined SOC, protein content, and fatty acid composition in the mutants and WT seeds. Interestingly, the Bna.gtr2 mutant seeds showed a significant increase in SOC, a significant change in fatty acid composition, and varying degrees of decrease in protein content (Figures 4D and S13, Dataset S20). These results suggest that Bna.GTR2s can affect not only the SGC but also seed size and may participate in multiple aspects of dry matter accumulation in B. napus seeds.GSLs are synthesized mainly in vegetative tissues, such as leaves and roots, and are finally transported to seeds in Arabidopsis. Therefore, we also detected the GSL content in the roots and leaves of BnaC02.gtr2 mutants. The results showed that the GSL content of the roots and the leaves of the mutants were 9.42 and 6.29 μmol/g, respectively. Compared with WT, the GSLs content of the mutants were significantly decreased in the leaves but significantly increased in the roots, which means that the transport of GSL synthesized in the roots to other vegetative tissues may be inhibited, resulting in the accumulation of GSL in the roots. (Figure S14, Dataset S23).
BnaC02.GTR2 is involved in multiple processes during the seed development of B. napus
It is noteworthy that the altered SGC of BnaC02.gtr2 mutant was accompanied by a variety of traits associated with seed development (Figures 4B to E), which means that SGC‐related genes or BnaC02.GTR2 may be associated with seed development. To verify this hypothesis, we first re‐observed the genes detected by TWAS at 20 and 40 DAF that were significantly associated with SGC and found that 73 and 56 of them were associated with amino acid, sugar, and fatty acid accumulation in Arabidopsis seeds, respectively (Figures S15a,b, Datasets S24 and S25). These results suggested that SGC‐related genes were inextricably linked to the process of seed development. Then, we conducted a correlation analysis between the expression of BnaC02.GTR2 and genes significant in TWAS related to seed development at 20 DAF [the critical period of seed dry matter accumulation (Yu et al., 2010)]. The results showed that BnaC02.GTR2 was significantly positively correlated with genes, such as ARF2 (AT5G62000), EMB2024 (AT5G24400), etc., and negatively correlated with genes, such as SWEET5 (AT5G62850), TT4 (AT5G13930), and UGNT1 (AT4G32272). These genes are involved in multiple seed developmental processes (Figure 5a, Dataset S26; Bock et al., 2006; Engel et al., 2005; Meinke, 2020; Okushima et al., 2005; Schruff et al., 2006; Tzafrir et al., 2004; Wang et al., 2008; Yu et al., 2005). Next, we constructed a co‐expression analysis of BnaC02.GTR2 and seed development‐related genes identified by TWAS at both 20 and 40 DAF and found that BnaC02.GTR2 is associated with multiple processes, such as plant development, protein accumulation, amino acid synthesis, GSL synthesis, sugar assimilation, and endosperm development (Figure 5b; Dataset S26). In addition, we used the gene expression of BnaC02.GTR2 at 20 DAF as a covariate for TWAS to re‐performed TWAS analysis. Four hundred and six genes disappeared significantly compared with the previous result, and these genes were significantly enriched in the synthesis of GSL and different pathways of seed development (Dataset S27, Figure S16). These results suggested that BnaC02.GTR2 is an important factor affecting genes related to seed development.
Figure 5
BnaC02.GTR2 possesses pleiotropic effects during seed development. (a) Correlation analysis of BnaC02.GTR2 and genes related to seed development. Red circles represent positive correlations with the expression of BnaC02.GTR2. Blue circles represent negative correlations with the expression of BnaC02.GTR2 and grey circles represent no correlation. Light green lines indicate the correlation between other genes. (b) Co‐expression analysis of BnaC02.GTR2 at 20 and 40 DAF. Circles represent genes. Squares represent the biological process; shape size represents the absolute value of the correlation between BnaC02.GTR2 and the genes in biological processes. (c) Volcano plot of BnaC02.gtr2 mutant. The Y‐axis is the adjusted P‐value, and the X‐axis is log2 fold‐change (FC) between mutant and WT. Grey lines are at log2FC = 1 or adjusted P‐value = 0.05. The significant genes in TWAS for SGC at 20 DAF are labelled with red points with gene names. (d) Enrichment analysis with differentially expressed gene set between BnaC02.gtr2 mutant and WT at 20 DAF.
BnaC02.GTR2 possesses pleiotropic effects during seed development. (a) Correlation analysis of BnaC02.GTR2 and genes related to seed development. Red circles represent positive correlations with the expression of BnaC02.GTR2. Blue circles represent negative correlations with the expression of BnaC02.GTR2 and grey circles represent no correlation. Light green lines indicate the correlation between other genes. (b) Co‐expression analysis of BnaC02.GTR2 at 20 and 40 DAF. Circles represent genes. Squares represent the biological process; shape size represents the absolute value of the correlation between BnaC02.GTR2 and the genes in biological processes. (c) Volcano plot of BnaC02.gtr2 mutant. The Y‐axis is the adjusted P‐value, and the X‐axis is log2 fold‐change (FC) between mutant and WT. Grey lines are at log2FC = 1 or adjusted P‐value = 0.05. The significant genes in TWAS for SGC at 20 DAF are labelled with red points with gene names. (d) Enrichment analysis with differentially expressed gene set between BnaC02.gtr2 mutant and WT at 20 DAF.We also analysed the transcriptome of WT and BnaC02.gtr2 mutant in developing seeds at 20 DAF. We identified 45 076 expressed genes (mean TPM > 1 for each type), including 1234 were differentially expressed (|log2foldchange| > 1 and Padj < 0.05). A significant decrease in the expression of genes associated with GSL synthesis and transport was observed. Interestingly, not only is the expression of genes regulating aliphatic GSL [such as HAG1/MYB28, CYP83A1, BAT5, AOP2 (AT4G03060)] significantly reduced, but also for the synthesis of indolic GSL [e.g. CYP81F1 (AT4G37430), GSTF9 (AT2G30860), and IGMT1 (AT1G21100)], suggesting possible crosstalk between GSL transport and synthesis pathways. And we also observed a significant reduction in the expression of indolic GSL transporter GTR3. These implied that aliphatic and indolic GSL transporters may cooperate for GSL transport (Figure 5c; Datasets S28 and S29). Moreover, the expression of genes related to seed development also changed significantly [such as MYC4 (AT4G17880), CYP78A9 (AT3G61880), and CCoAOMT7 (AT4G26220)]. Among them, MYC4 plays a redundant role with MYC3 and MYC2 in the accumulation of Arabidopsis seed storage protein (Gao et al., 2016), which might be related to the protein content alteration in the seeds of BnaC02.gtr2 mutant (Figure S13A). Cytochrome P450 CYP78A9 is involved in Arabidopsis reproductive development and also regulates silique length and seed weight in B. napus (Sotelo‐Silveira et al., 2013). In B. napus, a CACTA‐like transposable element in the upstream region of BnaA9.CYP78A9 acts as an enhancer to increase silique length and seed weight (Shi et al., 2019). CCoAOMT7 encodes a caffeoyl‐coenzyme A O‐methyltransferase (CCoAOMT)‐like protein, which contributes to lignin biosynthesis in the cell wall and may have a role on the development of seed coat (Yang et al., 2019; Yu et al., 2009). Furthermore, GO enrichment analysis indicated that the differentially expressed gene set between BnaC02.gtr2 mutant and WT at 20 DAF was enriched in multiple biological processes, including sulphate assimilation, GSL biosynthesis, carbohydrate metabolism, lipid binding, transporter activity, amino acid binding, and oligopeptide transport (Figure 5d; Dataset S30).Accordingly, these findings strongly suggest that SGC‐related genes have possible associations with seed development. BnaC02.GTR2, with pleiotropic functions, might not only play a role in seed GSL accumulation, but also be involved in multiple approaches of seed development in B. napus.
Discussion
GSLs are secondary metabolites distributed specifically in cruciferous plants (Halkier and Gershenzon, 2006; Sonderby et al., 2010). Although GSL has a very weak effect on the quality of crushed rapeseed oil, they can largely determine the commercial value of crushed meals of Brassica crops (Bisht and Augustine, 2019; Ishida et al., 2014; Kuznetsova and Polyakova, 2019). However, it is challenging to further reduce the SGC to <15 µmol/g dry meal in B. napus, despite persistent efforts made in the past half‐century, which greatly impedes the utilization of rapeseed meal as an alternative to soybean meal. In this study, we successfully dissected the genetic basis of SGC in B. napus through an integrative strategy by combining the GWAS and TWAS approaches. Compared with previous studies, our GWAS analysis showed high accuracy and reliability in identifying the loci controlling SGC—the SGC‐QTLs were consistent with the QTLs identified on chromosomes A02, A09, C02, C07, and C09 reported before (Cun‐Min et al., 2015; Feng et al., 2012; Li et al., 2014; Liu et al., 2020). In addition, several novel QTLs, such as qGSL.C03.1 and qGSL.A09.4 (Figure 1a; Dataset S2), were also identified.Seed development at 20 and 40 DAF is important for seed expansion, embryo development and accumulation of protein and GSL in B. napus (Dong et al., 2004; Larden and Triboi‐Blondel, 1994; Yu et al., 2010). Combining the results from TWAS, WGCNA, and POCKET, we predicted the potential candidate genes underlying the SGC‐QTLs and revealed some key genes involved in seed GSL accumulation. We identified 324 overlapping genes that were significantly associated with SGC in the two seed development stages. These genes significant in TWAS are likely to play a role in the regulation of SGC directly or indirectly. Seventy‐four GWAS and TWAS co‐localized genes, such as HAG1/MYB28 (Sonderby et al., 2007), BCAT4 (AT3G19710; Schuster et al., 2006), and MAM1 (AT5G23010; Textor et al., 2004) involved in diverse dimensions of GSL synthesis and accumulation and the genes responsible for GSL transport such as GTR2 (Nour‐Eldin et al., 2012) were identified. Overall, these results based on integrative mega data analysis, for the first time, constructed a comprehensive network including QTLs and co‐expressed genes involved in the regulation of SGC. These findings could assist in unravelling the genetic architecture of GSL synthesis and transport and exploring the evolution of gene function in related Brassica plants.Since low SGC is a major breeding objective of B. napus for the past decades, we performed a selection analysis of the QTLs associated with SGC. As expected, most low‐SGC haplotypes in these QTLs were selected in modern ‘double‐low’ varieties. Furthermore, the haplotype correlation analysis of loci in high and low SGC accessions demonstrated a clear co‐selection of multiple QTLs, and most low‐SGC accessions were produced mainly by selecting three QTLs—qGSL.C02.1, qGSL.A09.2, and qGSL.C09.1 (Figure 3g,h and S6). As evident from the breeding history of low‐SGC varieties, the reduced concentrations of GSL are often a consequence of crossing programs that have used the ‘Bronowski’ germplasm, and the favourable alleles conferring the low SGC phenotype were originated from ‘Bronowski’ (Gupta and Pratap, 2007; Sharpe and Lydiate, 2003). In China and other countries, ‘Liho’ and ‘Bronowski’ are used as primary gene donors to breed cultivars with double‐low quality (<1% erucic acid, <30 μmol/g SGC; Li and Wang, 2017). Here, the co‐selection of the three highly consensus haplotypes further supported their inheritance from a single resource, most likely from ‘Bronowski.’ Collectively, it can be inferred that to further reduce SGC in the future to meet the increasing demand for protein feed, it is necessary to diversify the source of favourable alleles. To this end, developing new germplasms by introducing novel variations at different SGC‐loci identified in this study or adopting new breeding strategies could be useful.Considering the diverse functions of GSL in biotic stress resistance, the ideal rapeseed varieties are expected to contain high GSL in vegetative tissues (including the silique walls) and low or no GSL in mature seeds (Giamoustaris and Mithen, 1995; Liu et al., 2020). However, as evident from the findings of GWAS in this study, the low‐SGC trait, as the main breeding target, would bring a strong selection pressure on both the genes responsible for GSL synthesis and those involved in GSL transport in traditional breeding. Reportedly, the GSL content varies in the leaves and seeds of natural germplasms of B. napus (Liu et al., 2020); therefore, it can be expected that the goal of breeding B. napus with high GSL in the vegetative tissues and low/no GSL in the seeds is achievable by mining the alleles involved in the synthesis and transport of GSL. In this study, both GWAS and TWAS have shown that the elite haplotype of BnaC02.GTR2 has a confirmed function in reducing the SGC in B. napus. However, the genetic contribution of the favourable allele in BnaC02.GTR2 was influenced by the favourable alleles at the other two QTLs due to co‐selection; therefore, its real effect on reducing GSL accumulation in natural accessions should be carefully evaluated in the future. Furthermore, we verified that all four Bna.GTR2 homologs contributed to the regulation of SGC, and their effects varied among different copies. Consistent with the evidence in Arabidopsis (Nour‐Eldin et al., 2012), B. rapa and B. juncea (Nour‐Eldin et al., 2017), our results clearly indicated the possibility of using GTRs to exclusively regulate SGC in B. napus (Figure 4b). However, investigation of important seed traits of mutants indicated the pleiotropic roles of Bna.GTR2s during seed development, affecting other important seed traits, such as TSW and SOC. This speculation was subsequently confirmed by transcriptome analysis of mutants (Figure 5c,d), TWAS, and co‐expression analysis (Figure 5a,b), in which BnaC02.GTR2 with genes involved in multiple biological processes, such as amino acid accumulation, GSL synthesis and transport, sugar assimilation, and oil accumulation.From an agricultural perspective, the utilization of source–sink transport engineering for GTR transporters is expected to serve as a new strategy to address the challenges of oilseed breeding (Nour‐Eldin et al., 2012). Although the selection analysis revealed that the genomic regions flanking BnaC02.GTR2 displayed low nucleotide diversity, the haplotypes leading to low‐SGC were selected during breeding because of their significant effect on the reduction of SGC (Figure 3a,b). Moreover, different unlike CRISPR mutants of BnaC02.gtr2, the natural low‐SGC haplotype had no significant effect on seed traits, such as TSW and SOC (Figures 6 and S10). Based on these findings, we conjectured that the low‐SGC haplotype of BnaC02.GTR2 was utilized in the breeding selection process that singularly affected SGC in B. napus. In other words, the function of BnaC02.GTR2 might have been modified partially by the selection process; therefore, even though the function of BnaC02.GTR2 and the low‐SGC BnaC02.GTR2 allele had a reduced GSL transportation activity, it was still able to maintain the normal development and metabolism of the seeds. These findings suggest that the functional difference between the natural variations and CRISPR mutants of BnaC02.gtr2 might depend on the structural alteration of the gene.
Figure 6
A putative low‐SGC breeding selection model. The dashed line indicates complete loss of function of GTR2 is not utilized, and the solid line indicates the process of breeding selection. An oval divided in half indicates a loss of function of BnaC02.GTR2. The dashed box indicates the processes potentially affected by BnaC02.GTR2. Boxplot indicates the SGC in low‐SGC SW1 subpopulation corresponding to two major haplotypes of four loci selected during breeding.
A putative low‐SGC breeding selection model. The dashed line indicates complete loss of function of GTR2 is not utilized, and the solid line indicates the process of breeding selection. An oval divided in half indicates a loss of function of BnaC02.GTR2. The dashed box indicates the processes potentially affected by BnaC02.GTR2. Boxplot indicates the SGC in low‐SGC SW1 subpopulation corresponding to two major haplotypes of four loci selected during breeding.The GTR transporters belong to the ubiquitous NRT/PTR family (NPF) and may have evolved from the original cyanogenic glucoside transporter transporting either cyanogenic glucosides or GSL (Jorgensen et al., 2017). Structural studies of bacterial POTs and NPF6.3 (Aduri et al., 2015; Doki et al., 2013; Parker and Newstead, 2014; Solcan et al., 2012; Sun et al., 2014) suggested the possible roles of amino acid residues P7 and P9 in the interaction with glucose (Jorgensen et al., 2017). However, whether these residues can interact with GSL and other vital metabolites with similar structures indiscriminately, thereby affecting SGC and seed development, remains unclear. Moreover, knowledge of the gene structure, substrate binding specificity and regulatory mechanisms of GTR transporters remain to be explored in B. napus. Taken together, our results provide a promising strategy for further application of GTR transporters to specifically reduce the SGC in B. napus, such as the precise modification by gene editing.
Methods
Plant materials and collection of phenotype data
A total of 505 B. napus accessions (Tang et al., 2020) collected from different research institutions were grown in Wuhan for 3 years (2017–2019). Leaves at seeding stage were sampled and quickly frozen in liquid nitrogen. Mature open‐pollinated seeds of the natural population comprising 505 B. napus inbred lines were harvested and desiccated to analyse SGC and other traits (SOC, seed protein content, and seed fatty acids) by a Foss NIRSystems 5000 near‐infrared reflectance spectroscope (Gan et al., 2003). Six replications were used for the analysis of every accession. The measurements were carried out at the National Research Center of Rapeseed Engineering and Technology, Huazhong Agricultural University, Wuhan, China.
Variation calling and annotation
The genome sequences of B. napus were downloaded from GenBank (http://www.genoscope.cns.fr/brassicanapus/; Chalhoub et al., 2014). Readings from resequencing (Tang et al., 2020) were compared with the genome using the parameter 'mem ‐M ‐k 32 ‐t 4' via the BWA software (Li, 2013). Then, PCR duplicates of sequencing reads were removed using SAMtools and mapped reads were retained in the BAM file format (Li et al., 2012). For genotyping, we applied the previous projections (Tang et al., 2020). GATK v3.6 (McKenna et al., 2010) was used to identify the existing sequence variations in 505 B. napus accessions and to merge the files. The SNPs and InDels with lower mapping quality (MQ < 20) or reads with lower sequencing depth (DP < 50) were also filtered. After obtaining the genotyping data, the LD‐KNN algorithm was used to estimate the missing genotype (Chen et al., 2014), and the accuracy of the data was ensured at >99.7%.
GWAS for SGC
A total of 10 620 048 high‐quality SNPs (MAF > 0.05) in all populations were used for GWAS and the significance thresholds for association were calculated using the Genetic Type I Error Calculator (GEC) software (Li et al., 2012) as 7.96 × 10−7. We used the GEMMA software package (Zhou and Stephens, 2012) to correlate the 3‐year WH SGC trait data using a mixed linear model. The POCKET from the GitHub (https://github.com/zhaouu/POCKET; Tang et al., 2020) was downloaded and combined multiple omics data, large‐effect variations, haplotype analysis, and known trait‐associated functional genes to prioritize candidate genes in SGC‐related QTLs of B. napus.
Weighted gene co‐expression network analysis
To study the significantly associated gene co‐expression profiles of GWAS and TWAS at 20 and 40 DAF, the expression of genes at each period was analysed by the WGCNA method (Langfelder and Horvath, 2008). The expression matrices were constructed, genes with similar expression patterns were clustered into the same module, and correlation analysis was performed with SGC. Pearson correlation coefficients were calculated for the correlation analysis. The networkType and TOMType parameters are set to ‘signed’. The modules were divided using the following parameters: deepSplit – 4, minModuleSize – 20, and mergeCutHeight – 0.1. The 20 and 40 DAF soft power were set at 14 and 8, respectively. Finally, correlation analysis was performed using Cytoscape v3.5.0 software for thioside, and the module and gene correlation networks were visualized and displayed.
TWAS for SGC
RNA‐seq data were analysed for quality control using FastQC (Ewels et al., 2016). STAR software (Dobin et al., 2013) was used to compare filtered reads with the B. napus reference genome, quantitative analysis of gene expression using Salmon (Patro et al., 2017), and standardization using tximport (Charlotte et al., 2016). Correlation analysis was performed using the EMMAX software (Kang et al., 2010), as well as inline and LMMs. SNPs were randomly selected from the genome to calculate the kinship matrix. The FDR‐corrected P‐value of ≤ 0.05 was used as the significance threshold for TWAS analysis.
GO enrichment analysis
All B. napus genes were compared with all A. thaliana proteins using BLASTP (Jacob et al., 2008). The most important proteins in A. thaliana (probability threshold 10−5 −) were selected for enrichment with the A. thaliana genome. Enrichment analysis was performed using Fisher's exact test, and the OR and P values were calculated.
Construction of the CRISPR/Cas9 vector and plant transformation
The CRISPR/Cas9 genome editing system, used in this study to edit the Bna.GTR2 genes was developed following previous methods (Xing et al., 2014). The primers used for the construction of the sgRNA vector are listed in the Dataset S18. The hypocotyl transformation mediated by Agrobacterium tumefaciens in B. napus reported by Zhang et al. (2015) was used in the genetic transformation system in this study (Zhang et al., 2015). Here ‘cv.13CK‐3’ (wild‐type, a semi‐winter‐type rapeseed variety with 137.25 μmol/g SGC and 34.26% erucic acid in seed) was used as the donor plants for transformation. The T0 transgenic plants were grown in a greenhouse (16/8 h light/dark cycle at 23–25 °C) for two generations to obtain the T2 generation plants. Subsequently, the T1 and T2 transgenic plants and WT plants were grown in the transgenic experimental field of Huazhong Agricultural University. Each line was planted in two rows with 10 plants per row, with a spacing of 21 cm between plants within each row and 30 cm between rows. Field management was carried out according to standard breeding practices. The leaves from WT and mutant plants were collected using Transzol sampling for DNA extraction. CRISPR‐transformed B. napus plants were first used to identify the Cas9 protein using primers, and the Cas9‐positive plants were then specifically amplified and sequenced to identify the target gene. The amplified target fragments were sequenced, and the sequencing results were analysed using DSDecode online (http://skl.scau.edu.cn/dsdecode/) for target site editing. The fast restriction endonuclease and T4 DNA ligase used in this study were purchased from Thermo Fisher Scientific (China). The E. coli strain, DH5α (Beijing Transgenic Biotechnology, China) and the Agrobacterium strain GV3101 (prepared in our laboratory) were used in this experiment. Mature open‐pollinated seeds were harvested and analysed by near‐infrared reflectance spectroscopy, as described above.
Analysis of GSL content in vegetative tissues
The GSL content in the leaf and root of WT (cv.13CK‐3) and BnaC02.gtr2 mutants obtained using CRISPR/cas9 were analysed. Three biological replicates were used. In the three‐leaf stage of seedlings, roots and the third leaf after the heart leaf were taken from each seedling as a biological replication. Fresh roots and leaves were immediately frozen in liquid nitrogen and stored at −80 °C until GSL extraction. The samples were quickly broken after weighing, and mixed with 20 mL 90% methanol, then heated in a water bath and centrifuged, then the supernatant was transferred to a 12 mL column containing 1ml DEAE Sephadex A‐25 (sigma, St. Louis, MO, USA). After the liquid was drained, 100 μL 2 mg/mL sulphatase solution was added to desulphurize the GSL, using 2‐propenyl GSL (Sinigrin, Sigma‐Aldrich, Co., St. Louis, MO, USA) as an internal standard as described by Feng et al. (2012). GSLs were analysed via HPLC (Agilent Technologies, CA) following previously described methods (Kliebenstein et al., 2001a).
RNA‐seq transcriptomic analysis
The seeds of WT and BnaC02.gtr2 mutants at 20 DAF were collected for RNA‐seq with three biological replicates. Total RNA was extracted using the TIANGEN RNAprep Pure Plant Kit. A total amount of 1.5 μg RNA per sample was used as input material for the RNA sample preparations. Sequencing libraries were generated using the NEBNext® UltraTM RNA Library Prep Kit for Illumina® (NEB, USA). The library preparations were sequenced on an Illumina Hiseq 4000 platform and 150 bp paired‐end reads were generated. Quality control of RNA‐seq reads was performed using FastQC software and summarized using multi‐qc (Ewels et al., 2016). We used Salmon (Patro et al., 2017) to quantify the reads of annotated genes in the B. napus genome (http://www.genoscope.cns.fr/brassicanapus/), which were then imported and normalized with tximport (Soneson et al., 2015) and DESeq2 was used for differential expression analysis (Love et al., 2014).
Conflict of interest
The authors declare no competing interests.
Author contributions
H.D. and L.G. designed and supervised the study. Z.T., Y.Z., and H.Z. performed the bioinformatics analysis. Z.X., L.D., and L.W. performed the GTR2‐related experiments. S.T. analysed the SGC of the association population. Z.T., Z.X., L.D., and Y.Z. prepared the manuscript. D.H., L.G., and X.Y. revised the manuscript. All the authors have read and approved the manuscript.Figure S1 Histograms and correlation coefficients of BLUP values and the 3‐year distribution of SGC in Wuhan for 505 accessions.Figure S2 GWAS of SGC for 2017–2019 in Wuhan.Figure S3 GO enrichment of genes significant in TWAS for SGC.Figure S4 Soft threshold power analysis to obtain scale‐free fit metrics for network topology at 20 DAF (A) and 40 DAF (B).Figure S5 Prioritization of the causal genes in qGSL.C02.1.Figure S6 Analysis of nucleotide diversity (π) and QTL haplotypes.Figure S7 In A–I, the boxplot of SGC between the two types of haplotypes of different QTLs in SW1 and SW2.Figure S8 Prioritization of the causal genes in qGSL.A09.2.Figure S9 Prioritization of the causal genes in qGSL.C09.1.Figure S10 Boxplots for TSW and SOC.Figure S11 Characterization of editing of Bna.GTR2 genes using the CRISPR/Cas9 system.Figure S12 Comparison of the diameter of mature seeds between Bna.gtr2 mutants.Figure S13 Boxplot of seed traits in WT and Bna.gtr2 mutants.Figure S14 GSL content in leaves (A) and roots (B) (μmol/g) of WT and BnaC02.gtr2 mutants.Figure S15 Seed development‐related genes overlapped with genes significant in TWAS in SGC.Figure S16 Enrichment analysis of significant gene set that detected in TWAS 20 DAF results with BnaC02.GTR2 as a covariate.Click here for additional data file.Dataset S1 Lead SNPs significantly associated with SGC for Wuhan2017, Wuhan2018, and Wuhan2019.Dataset S2 SGC Wuhan 2017, SGC Wuhan 2018, SGC Wuhan 2019, SGC Wuhan BLUP co‐location GWAS sites.Dataset S3 Information on GWAS‐associated candidate genes in 15 QTLs.Dataset S4 Summary of genes significant in TWAS for SGC at 20 DAF.Dataset S5 Summary of genes significant in TWAS for SGC at 40 DAF.Dataset S6 Glucosinolate‐related genes were significantly associated with SGC identified by TWAS at 20 DAF.Dataset S7 Glucosinolate‐related genes were significantly associated with SGC identified by TWAS at 40 DAF.Dataset S8 GO enrichment of genes significant in TWAS for SGC at 20 DAF.Dataset S9 GO enrichment of genes significant in TWAS for SGC at 40 DAF.Dataset S10 GO enrichment of overlapped genes significant in TWAS for SCC at 20 and 40 DAF.Dataset S11 List of WGCNA module‐trait relationships at 20 DAF.Dataset S12 List of WGCNA module‐trait relationships at 40 DAF.Dataset S13 List of WGCNA genes for SGC at 20 DAF.Dataset S14 List of WGCNA genes for SGC at 40 DAF.Dataset S15 Genes significant in TWAS located in the SGC‐related QTLs.Dataset S16 Gene prioritization for qGSL.C02.1.Dataset S17 Detection of mutations at CRISPR/Cas9 target sites in T2 generation.Dataset S18 Primers used in this study.Dataset S19 Detection of mutations at the putative CRISPR/Cas9 off‐target sites in the T1 generation.Dataset S20 Comparison of fatty acid composition (%), oil content (%), SGC (μmol/g), and protein content (%) in mature seeds between Bna.gtr2 mutants and their control in the T2 generation.Dataset S21 Thousand seed weight (g) between Bna.gtr2s mutants and their control in T2 generation.Dataset S22 Comparison of diameter (mm) of mature seeds between Bna.gtr2 mutants and their control in the T2 generation.Dataset S23 GSL content in leaves and roots (μmol/g) of WT and BnaC02.gtr2 mutants.Dataset S24 Seed development‐related genes were significantly associated with SGC identified by TWAS at 20 DAF.Dataset S25 Seed development‐related genes were significantly associated with SGC identified by TWAS at 40 DAF.Dataset S26 List of network module genes for SGC at 20 and 40 DAF.Dataset S27 Summary of 20 DAF TWAS significant genes for SGC in the case of BnaC02.gtr2 as a covariate.Dataset S28 RNA‐seq information on the expression of WT and mutant‐expressed genes in seeds at 20 DAF.Dataset S29 Information on differentially expressed genes between WT and BnaC02.gtr2 mutants.Dataset S30 GO enrichment of differentially expressed genes between WT and BnaC02.gtr2 mutants.Click here for additional data file.
Authors: Marie C Schruff; Melissa Spielman; Sushma Tiwari; Sally Adams; Nick Fenby; Rod J Scott Journal: Development Date: 2005-12-08 Impact factor: 6.868
Authors: Susanne Textor; Stefan Bartram; Jürgen Kroymann; Kimberly L Falk; Alastair Hick; John A Pickett; Jonathan Gershenzon Journal: Planta Date: 2004-01-22 Impact factor: 4.116
Authors: Nicolae Solcan; Jane Kwok; Philip W Fowler; Alexander D Cameron; David Drew; So Iwata; Simon Newstead Journal: EMBO J Date: 2012-06-01 Impact factor: 11.598
Authors: Morten Egevang Jørgensen; Deyang Xu; Christoph Crocoll; Heidi Asschenfeldt Ernst; David Ramírez; Mohammed Saddik Motawia; Carl Erik Olsen; Osman Mirza; Hussam Hassan Nour-Eldin; Barbara Ann Halkier Journal: Elife Date: 2017-03-03 Impact factor: 8.140