Literature DB >> 27880793

Seed Quality Traits Can Be Predicted with High Accuracy in Brassica napus Using Genomic Data.

Jun Zou1, Yusheng Zhao2, Peifa Liu1, Lei Shi1, Xiaohua Wang1, Meng Wang1, Jinling Meng1, Jochen Christoph Reif2.   

Abstract

Improving seed oil yield and quality are central targets in rapeseed (Brassica napus) breeding. The primary goal of our study was to examine and compare the potential and the limits of marker-assisted selection and genome-wide prediction of six important seed quality traits of B. napus. Our study is based on a bi-parental population comprising 202 doubled haploid lines and a diverse validation set including 117 B. napus inbred lines derived from interspecific crosses between B. rapa and B. carinata. We used phenotypic data for seed oil, protein, erucic acid, linolenic acid, stearic acid, and glucosinolate content. All lines were genotyped with a 60k SNP array. We performed five-fold cross-validations in combination with linkage mapping and four genome-wide prediction approaches in the bi-parental population. Quantitative trait loci (QTL) with large effects were detected for erucic acid, stearic acid, and glucosinolate content, blazing the trail for marker-assisted selection. Despite substantial differences in the complexity of the genetic architecture of the six traits, genome-wide prediction models had only minor impacts on the prediction accuracies. We evaluated the effects of training population size, marker density and phenotyping intensity on the prediction accuracy. The prediction accuracy in the independent and genetically very distinct validation set still amounted to 0.14 for protein content and 0.17 for oil content reflecting the utility of the developed calibration models even in very diverse backgrounds.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 27880793      PMCID: PMC5120799          DOI: 10.1371/journal.pone.0166624

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Rapeseed (Brassica napus L.) is one of the most important oilseed crops worldwide [1]. The breeding goal for rapeseed is high oil yield coupled with excellent oil quality [2-4]. The latter is mainly driven by the composition of the fatty acid components of erucic acid (C22:1), stearic acid (C18:0), oleic acid (C18:1), linoleic acid (C18:2), and linolenic acid (C18:3) [2, 3, 5]. Moreover, protein and glucosinolate content determine to a large extent the quality of the rapeseed meal [6-8]. All of these seed traits are influenced by the environment [9-11], and their precise estimation requires phenotyping in replicated multi-environmental field trials. Moreover, measuring quality traits in rapeseed is often labor-intensive. Therefore, quality traits are interesting targets for genomic-assisted crop improvement. Genomic-assisted crop improvement can either be based on marker-assisted selection [12] or genome-wide predictions [12, 13]. In marker-assisted selection, the performance of individuals is predicted using a few diagnostic markers associated with the traits under consideration [14]. In contrast, genome-wide prediction exploits many markers without performing marker-specific significance tests [15]. The accuracy of marker-assisted selection and genome-wide predictions depends on the genetic architecture underlying the traits under consideration. Marker-assisted selection is most effective if the trait is controlled by a few genes with large effects. If the genetic architecture is complex, quantitative trait loci (QTL) detection is not reliable and genome-wide prediction is more powerful [16]. The presence of QTL underlying quality traits in rapeseed has been investigated in linkage and linkage disequilibrium mapping studies [1, 3, 9–11, 17–28]. Accumulated information of the QTL accounting for seed quality traits such as seed fatty acid has also been identified in other Brassica species, such as B. oleracea and B. juncea [29, 30], which could provide reference for the comparison between species. However, linkage and linkage disequilibrium mapping, are often afflicted by upwards biased estimates in terms of the proportion of genotypic variance explained by QTL. Therefore, cross- or independent validations have been suggested to obtain unbiased estimates of QTL effects but have been applied only in a limited number of studies in rapeseed [31, 32]. The potential and limits of genome-wide predictions have been examined for several major crops, such as barley [33], wheat [15, 34–36], maize [37-42], rice [43], sunflower [44], forage plants [45], sugar beet [46, 47], and soybean [48, 49]. The results underlined the potential of genome-wide prediction as a powerful tool to accelerate selection gain in plant breeding. Recent studies in rapeseed also highlighted the potential of genome-wide prediction of flowering time [31, 50, 51], plant height, protein content, oil content, glucosinolate content, grain yield [31, 51]. Nevertheless, the benefits of genome-wide prediction compared to marker-assisted selection have not been examined in rapeseed. Moreover, the potential to exploit epistasis to predict seed quality traits has not been investigated, although previous studies suggested that epistatic interactions were important for fatty acid metabolism [11]. This study is based on a published dataset from the bi-parental TN DH population comprising 202 DH lines, which has been intensively used to study the genetic architecture of important agronomic traits [9–11, 22, 23] and were genotyped with an Infinium 60K-SNP array [52] being extensively used in Brassica [24, 53, 54]. The two parents of the TN DH mapping population originated from the European and Chinese genepools and have been used widely for rapeseed breeding programs in both target regions. Our objectives were to (i) test for the presence of QTL exhibiting reliable and large effects using five-fold cross-validations, (ii) investigate the effect of the genetic architecture on the superiority of different genome-wide prediction models, (iii) examine the potential to improve the prediction accuracy by modeling digenic epistatic effects, (iv) validate the prediction accuracy in a genetically independent population, and (v) discuss the consequences for implementing genome-wide predictions in applied rapeseed breeding programs.

Materials and Methods

Plant materials and field trials

A bi-parental DH population of B. napus denoted as TN DH has been developed, comprising 202 unique lines [22]. The DH lines were derived from a microspore culture based on the F1 cross between Tapidor and Ningyou7. The parent Tapidor is a European winter cultivar with low erucic acid and glucosinolate content in the seeds. The parent Ningyou7 is a Chinese semi-winter cultivar with high erucic acid and glucosinolate content in seeds. The TN DH mapping population along with its two parents was grown in 11 winter and semi-winter ecotype environments (S1 Table). The phenotypic data was generated and used in a previous linkage mapping study, which was based on a limited set of markers [9–11, 22, 23]. The experimental design was a randomized complete block design with 3 replications. Every plot comprised three rows with a total plot size of 3.0 to 4.0 m2. Phenotypic data was collected for six important seed quality traits for each DH line and parent: seed oil content (%) and protein content (%), which were separately defined as the percentages of the oil and protein in the total seed dry weight, respectively; three important components of the fatty acid in the seed oil: the erucic acid content (%), the linolenic acid content (%), and the stearic acid content (%); and the content of glucosinolates in the total seed dry weight (µmol/g). The quality traits were determined based on near infrared reflectance spectroscopy measuring three technical and three biological replicates. The details of the phenotyping are outlined in detail in previous studies [10, 11, 22]. A total of 117 genetically independent B. napus inbred lines were used in this study for validating the prediction accuracy based on the TN DH population. The validation population was developed based on hundreds of crosses between B. rapa and B. carinata accessions [55, 56]. The validation population was grown in one semi-winter environment (Wuhan, China) in 2013–2014 in a trial with three replicates. Every plot comprised two rows with a total plot size of 2.0 to 3.0 m2. Seed oil content and protein content was measured using the same method as that used for the TN DH population.

Phenotypic data analyses

The best linear unbiased estimates (BLUEs) of phenotypic values and variance components were estimated by the following linear mixed model using ASREML-R software [57]: The genotype effects were treated as fixed effects and the other effects were treated as random. To estimate variance components, all effects were treated as random. Broad-sense heritability was calculated as the ratio of genotypic to phenotypic variance: where NE refers to the number of environments, NR is the average number of replications per location, is the genotypic variance, is the variance of genotype times environment interaction, and refers to the error variance.

Genotypic data analyses

The 202 DH lines of the TN DH population and the two parents were previously fingerprinted using a 60k SNP array based on an Illumina Infinium assay [52]. Quality control was performed and those markers have been removed which are either monomorphic, have missing values of >5%, a minor allele frequency <5%, or degree of heterozygosity >5% in the DH population. After applying the quality check outlined above, 180 DH lines with 13,678 high-quality SNP markers remained. By aligning the marker sequence of the 13,678 SNPs to the reference “Darmor-bzh” genome of B. napus version 4.1[58] via BLAST analysis, 9,628 SNP markers could be assigned a unique physical position in the genome with the parameters of 100% alignment, E value <10−20 and mismatch <2 (S2 Table). After removing redundant SNPs in full linkage disequilibrium (LD), 1,527 markers representing recombination loci (referred to as representative markers) remained (S2 Table). The 1,527 representative markers included 1,052 representative markers from 1,052 genetic bins and 475 single markers. From each of the genetic bins, one marker with the least missing rate and the best available physical alignment position was selected as representative marker. In this way, a total of 1,527 representative markers were obtained and used for the subsequent analysis. Pairwise LD between markers was calculated as the squared Pearson moment correlation coefficient using R package genetics [59]. The 117 lines of the validation population were genotyped using the same SNP array and the 1,527 representative markers selected in the TN DH population were used for prediction.

QTL mapping and genome-wide prediction

For the QTL mapping, the SNP markers were coded according to the F∞ metric [60]. The genome-wide QTL mapping method is based on the inclusion of cofactors [7] obtained by stepwise multiple linear regressions using the Bayesian information criterion [61]. The genome-wide scan was conducted comparing the full model comprising the SNP and all cofactors versus a reduced model including only cofactors. We used a false-discovery rate (FDR) of P<0.1 to test for significance. The proportion of the phenotypic variance explained (PVE) by all QTLs, was estimated using the adjusted R2 values fitting a multiple regression [62]. We performed a five-fold cross-validation of the QTL mapping in which the total population of 180 DH lines was randomly divided into two groups with 100 replications according to the ratio of 4:1 (one group with 144 lines and the other group with 36 lines). One hundred and forty-four lines were used as the training set and the remaining 36 were used as the test set. QTL mapping was performed in each training set and estimated QTL effects were used to predict the genetic values of the lines of the test set. The prediction accuracy was defined as the correlation between the predicted and observed phenotypic values standardized with the square root of the heritability. For the genome-wide prediction, four different models were used in this study. We implemented three methods exploiting the additive marker effect: genomic best linear unbiased prediction (GBLUP), ridge regression best linear unbiased prediction (RR-BLUP) [63], and BayesCπ [64]. To accelerate computation speed and eliminate the impact of LD on the prediction accuracy of BayesCπ, we removed SNPs with r2>0.95. For BayesCπ, the Gibbs sampling ran 20,000 times, and the first 6,000 cycles were used as burn in. We also implemented an extended GBLUP model denoted as EG-BLUP, which models digenic epistatic effects as well as additive effects [65]. The accuracies of all these genome-wide prediction methods were determined based on the adjusted entry means for the 180 genotypes applying five-fold cross-validation. Details of the implementation of the models have been described elsewhere [41, 42, 65]. We performed 100 cross-validation runs and estimated the accuracy as the Pearson correlation coefficient between predicted and observed values standardized with the square root of the heritability. To evaluate the dependence of prediction accuracy on training set size, we applied cross-validation with randomly selected subsets of n (n = 48, 80, 112, 144) lines from the full data to form the training set and used the remaining lines as the test set. To evaluate the dependence of prediction accuracy on marker density, we selected subsets of m (m = 100, 1,000, 5,000, 13,678) evenly distributed markers from the full dataset and applied five-fold cross-validations using all 180 lines. The sampling procedure was randomly repeated 100 times for each scheme, and the prediction accuracies were averaged across the 100 cross-validation runs. We focused in the above outlined analyses of sampling of marker subsets and training set sizes on the traits seed oil content and protein content. The traits were selected because oil content was evaluated in a large number of 11 environments and protein content exhibited a high heritability. We also evaluated the prediction accuracy using an independent validation population. The marker effects were estimated based on RR-BLUP and the TN DH population. Marker effects were used to predict the performance of the 117 individuals of the validation population. The prediction accuracy was again estimated as the Pearson correlation coefficient between predicted and observed values standardized with the square root of the heritability. Heritability was estimated using the variance components estimated for the TN DH population.

Results

Intensive field evaluation of the TN DH population resulted in high-quality phenotypic data

We combined the information on seed protein content with previously published data for other seed quality traits of the TN DH population. We observed a wide variation of BLUEs approximating a normal distribution for most traits, except for erucic acid content (Fig 1, S3 Table). The analyses across environments revealed significant (P<0.001) variances for genotypes, environments, and interactions between genotypes and environments (Table 1). Broad-sense heritability estimates were high for the six traits, ranging from 0.81 for protein content to 0.98 for erucic acid content. Consequently, the intensive phenotyping resulted in high-quality data representing an excellent source for dissecting the genetic basis of the six traits.
Fig 1

Distributions and pairwise correlations for Best Linear Unbiased Estimates of six seed traits evaluated for 202 lines of the TN DH population in multi-environmental field trials.

All correlations passed significance tests with P-values less than 0.001 except for the correlation between protein content and erucic acid, glucosinolates, and stearic acid content.

Table 1

Estimates of variance components (σ2) and broad-sense heritability (h2) for the TN DH population with 202 lines evaluated for six seed traits in multi-environmental field trials.

Source*/TraitsOil contentProtein contentErucic acid contentLinolenic acid contentStearic acid contentGlucosinolate content
σG22.640.53198.190.040.08229.38
σG×E20.710.4211.840.010120.82
σE21.170.617.780.020.0154.55
Heritability0.960.810.980.820.940.9
Mean42.7621.6924.918.790.8174.16
Range38.87–47.3519.13–24.30.77–46.768.13–9.550.27–1.5030.31–101.17
Nr. of environments1155226

*All variances pass a significance test with P values less than 0.001.

Distributions and pairwise correlations for Best Linear Unbiased Estimates of six seed traits evaluated for 202 lines of the TN DH population in multi-environmental field trials.

All correlations passed significance tests with P-values less than 0.001 except for the correlation between protein content and erucic acid, glucosinolates, and stearic acid content. *All variances pass a significance test with P values less than 0.001. In total, 80% of the pairwise trait comparisons were significantly (P<0.001) associated with Pearson moment correlation coefficients ranging from -0.84 between erucic acid content and stearic acid content to 0.66 between erucic acid content and glucosinolate content (Fig 1). Interestingly, protein content was only poorly associated with erucic acid, glucosinolate, and stearic acid content. This lack of associations points to independent biochemical pathways and genes controlling the two classes of traits.

Large differences in the complexity of the genetic architecture of the six seed quality traits

Altogether, 151 SNP markers passed the FDR significance level of P<0.1 in the genome-wide QTL mapping scan (Figs 2 and S1). The QTL numbers for the six traits ranged from 8 to 59 and were distributed across 19 chromosomes of B. napus. Phenotypic variance explained by a single putative QTL exceeded 5% for 27 SNPs and reached 45% for a QTL located on chromosome C03 controlling erucic acid content (Table 2). A second major QTL was detected on chromosome A08 for erucic acid content, explaining 31% of the phenotypic variance. However, the majority of the QTLs, especially those influencing oil and protein content, exhibited only minor effects. Among the detected QTLs, seven were putative pleiotropic QTLs influencing two traits. For instance, the marker “Bn-scaff_15794_1-p347392”, which was physically aligned to C03 and detected as a putative pleiotropic QTL, explained 26% and 45% of the phenotypic variance for stearic acid and erucic acid concentration, respectively.
Fig 2

Manhattan plots based on composite interval QTL mapping for the six seed quality traits.

The x-axis represents the corresponding physical position of each SNP of the 13,678 SNPs across the genome from chromosome A01 to A10 and C01 to C09. Those markers without unique alignment to the reference genome were arranged in the axis noted as “not assigned”. The Y-axis represents the corresponding false-discovery rate (FDR) of each QTL indicating the significance for QTL calling. The PVE, i.e. proportion of the phenotypic variance explained by each QTL, is listed in Table 2.

Table 2

Significant marker-trait associations and the proportion of explained phenotypic variance (PVE) detected in a genome-wide association mapping approach for six quality traits of TN DH population.

TraitNo.MarkerP valuesPVEGeneticChr.PhysicalDetected in previous studies
bin code1position (bp) 2
Oil content1Bn-A10-p58691754.09E-144.27612A105499050TN-qOC-A10-1 (Jiang et al. 2014)[10]
2Bn-A09-p7390889.01E-102.96551A09131541
3Bn-A07-p163791353.29E-050.01485A0718348848SG-qOC-A7 (Zhao et al. 2012)[66]
4Bn-A07-p158021745.96E-060.7single markerA07NA3
5Bn-A04-p16846955.22E-115.86806C0425032235
6Bn-A01-p277746662.64E-060.5single markerC0138105589
7Bn-scaff_15695_1-p2948943.31E-050.25854C0529927748
8Bn-scaff_16361_1-p9300640.0001421.23956C08NA3
9Bn-scaff_20942_1-p4401065.21E-069.49694C02NA3
10Bn-scaff_17637_1-p2044396.89E-062.74945C0812326547
11Bn-scaff_16565_1-p11693204.06E-081.18698C0212445051TN-qOC-C2-2 (Jiang et al. 2014)[10]
12Bn-scaff_15838_1-p22535036.36E-092.94660C012629345TN-qOC-C1-1 (Jiang et al. 2014)[10]
13Bn-A05-p13084717.76E-070.06309A051423576SG-qOC-A5 (Zhao et al. 2012)[66]
14Bn-Scaffold000217-p201682.05E-050.37361C05NA3
15Bn-scaff_20901_1-p17055745.59E-134.91839C052309449
16Bn-scaff_23761_1-p2496284.09E-1516.44single markerC0357481703TN-qOC-C3-3 (Jiang et al. 2014)[10]
17Bn-A02-p277997271.92E-050.93139A0224756539DY-qOC-A2-2 (Delourme et al. 2006)[1]; Z5-qOC-A2-1 (Sun et al. 2012)[67]
18Bn-scaff_16231_1-p22132391.30E-135.75949C0820090489
19Bn-A03-p153971874.40E-073.81195A0314446606TN-qOC-A3-3 (Jiang et al. 2014)[10]
20Bn-scaff_16545_1-p2383974.80E-166.43508C0814155605
21Bn-A06-p79491471.09E-091.35388A06NA3
22Bn-scaff_16130_1-p10134453.94E-081.79911C0728755038
23Bn-scaff_16130_1-p10394527.23E-062.28911C0728772215
24Bn-A06-p241328421.79E-072.49421A0623129285Z5-qOC-A6-1 (Sun et al. 2012)[67]
25Bn-scaff_22728_1-p3577894.28E-060.88160C036154024TN-qOC-C3-3 (Jiang et al. 2014)[10]; OIL.C3.s.1(Niklas Körber et al.2016)[68]
26Bn-A03-p7642743.67E-060.06141A03632475
27Bn-scaff_18936_1-p8902861.50E-061.48731C033419666OIL.C3.s.1(Niklas Körber et al. 2016)[68]
Protein content1Bn-scaff_15838_3-p2567678.37E-117.13121A02NA3
2Bn-A03-p212258467.85E-080.08211A0319974471
3Bn-A04-p126701297.18E-051.96269A0413394800qThrC-4-2(Xu et al.2015)[69]
4Bn-A03-p201504794.68E-062.65209A0319014117
5Bn-scaff_16361_1-p3004358.14E-060.563A0111871025
6Bn-A09-p51901803.69E-050.02556A094862135
7Bn-scaff_17526_1-p8604599.60E-055.21977C091679866qMetC-19-9(Xu et al.2015)[69]
8Bn-scaff_16449_1-p2515269.17E-070.42709C02NA3
9Bn-A09-p335950111.08E-112.85single markerNA3NA3
10Bn-A01-p80582557.87E-179.4950A017238500qPC-1(Huang et al.2016)[70]
11Bn-A09-p159751382.03E-061.91567A09NA3
12Bn-scaff_17119_1-p3496226.35E-2010.98778C03NA3
13Bn-scaff_17119_1-p4141426.82E-155.37778C0357158030
14Bn-scaff_27815_1-p3674036.45E-052.65431A071626003
15Bn-A01-p271256494.28E-051.3587A01NA3
16Bn-Scaffold000217-p382767.56E-051.78361C05NA3
17Bn-scaff_20901_1-p6472701.44E-1310.05840C053389245qAlaC-15-4 (Wen et al.2015)[71]
18Bn-scaff_16231_1-p22132395.32E-058.59949C0820090489
19Bn-scaff_23799_1-p67823.97E-061.76single markerNA3NA3
20Bn-scaff_22728_1-p3490777.20E-072.77160C036162734qMetC-13-6 (Xu et al.2015)[69]
Erucic acid1Bn-scaff_15803_1-p8008741.11E-070.3658C0114815203
2Bn-scaff_15747_1-p1679542.16E-060.08675C01NA3
3Bn-scaff_19614_1-p360231.66E-060.11675C0113532546
4Bn-A03-p248971114.39E-051.08223A03NA3
5Bn-scaff_18039_1-p2060424.22E-112.08682C0133006005
6Bn-scaff_15844_1-p1192161.68E-120.63single markerC0133345268
7Bn-A03-p236099343.90E-060.1219A03NA3
8Bn-A01-p36646988.21E-051.9533A01NA3
9Bn-scaff_16397_1-p219612.35E-070.38885C0632884939ERA.C6.s.1(Niklas Körber et al.2016)[68]
10Bn-scaff_15794_1-p3473922.97E-2445.35775C0355942754qC3-3(Wang et al.2015)[11]
11Bn-A09-p197185819.53E-150.63568A09NA3
12Bn-scaff_17984_1-p1239185.88E-053.54569A09NA3
13Bn-A08-p132213805.64E-0631.31510A0810967853qA8-5(Wang et al.2015)[11]
14Bn-C14160250-p36875.39E-100.98509A08NA3
15Bn-A06-p76367294.83E-070.11388A067278355
16Bn-A06-p74594283.40E-050.69single markerA06NA3
17Bn-A03-p148112041.05E-070.03191A03NA3
18Bn-A03-p81776951.19E-101.91173A037472584
Linoleic acid1Bn-A02-p18909131.40E-130.23655NA3NA3
2Bn-A02-p24514707.90E-171.76655NA3NA3
3Bn-A02-p71054351.89E-140.22single markerA024150237
4Bn-A10-p5108461.01E-128.55610A102998077
5Bn-A10-p11933362.13E-093.29609A10NA3
6Bn-A10-p20926127.56E-110.11single markerA10NA3
7Bn-A09-p35466191.99E-224.68single markerC041322839
8Bn-A02-p108500123.55E-069.07118A027665679
9Bn-A02-p121456071.07E-110.12123A02NA3
10Bn-A09-p208634591.06E-201.99single markerNA3NA3
11Bn-A09-p16319447.56E-055.13554A092294691LIA.A9.w.1(Niklas Körber et al.2016)[68]
12Bn-scaff_22749_1-p2503192.31E-070.6129C0226414418
13Bn-A07-p168466247.65E-151.68485A0718775516
14Bn-A01-p279685841.06E-260.93single markerNA3NA3
15Bn-A02-p184386910.0001623.28single markerA02NA3
16Bn-A02-p190709581.17E-051.08131A0218106292
17Bn-scaff_18855_1-p7954321.72E-140.06757C0331378910
18Bn-scaff_16135_1-p1969220.000320.03532A0815018355
19Bn-A10-p157426891.45E-292.59648A1015807427
20Bn-A05-p181470402.03E-111.09248A0912040388
21Bn-scaff_16372_1-p196650.0001941.27769C0348510330qC3-2(Wang et al.2015)[11]
22Bn-scaff_20294_1-p4382936.85E-063.77887C06NA3
23Bn-A10-p154429759.97E-222.37652A1016049734
24Bn-scaff_17799_1-p27734261.55E-06~0.00990C0939884740
25Bn-A09-p335423345.40E-050.83single markerA09NA3
26Bn-A01-p220163535.40E-395.4476A0118645502
27Bn-A05-p1145982.39E-060.01single markerA05128946
28Bn-A01-p98105524.72E-330.04single markerA018432723
29Bn-A01-p81081781.57E-221.9650A01NA3
30Bn-scaff_17821_1-p210533.33E-103.01777C0356695853qC3-3(Wang et al.2015)[11]; qC18:2-13-5(Wen et al.2015)[71]
31Bn-A09-p196884765.00E-070.02568A09NA3
32Bn-A05-p4722711.39E-060.37307A05583644
33Bn-scaff_15838_1-p22535033.85E-127.51660C012629345qC18:2-11-3(Wen et al.2015)[71]
34Bn-scaff_15585_1-p10207643.37E-080.96279C0444431942
35Bn-scaff_15676_1-p3415082.92E-292.23858C05NA3
36Bn-scaff_19170_1-p11076194.39E-060.0810C0418803084
37Bn-scaff_19170_1-p5883568.41E-150.8510C04NA3
38Bn-A10-p33291311.54E-083.3433A074431854
39Bn-Scaffold000164-p1204596.78E-070.2283A0120722278
40Bn-scaff_21956_1-p1607103.46E-110.72821C0439249761
41Bn-scaff_16876_1-p1715103.75E-180.11817NA3NA3
42Bn-scaff_16876_1-p3030061.11E-160.46816C0434584781
43Bn-A01-p248301115.08E-130.4382A0120561797
44Bn-A08-p165620351.20E-060.28522A0814030898qA8-5(Wang et al.2015)[11]; LIA.A8.w.1(Niklas Körber et al.2016)[68]
45Bn-scaff_16069_1-p37804944.12E-120.04929C0740184750
46Bn-scaff_16545_1-p1103424.09E-050.01508A087961925qA8-6(Wang et al.2015)[11]
47Bn-scaff_27204_1-p15446.30E-050.421045C07NA3
48Bn-A08-p112124941.06E-050.47509A08NA3
49Bn-scaff_16130_1-p10394520.0002530.07911C0728772215
50Bn-scaff_15705_1-p18181772.15E-110.29918C0735089587
51Bn-A02-p29622985.75E-130.09single markerA02NA3
52Bn-A03-p108839306.17E-100.081051A03NA3
53Bn-scaff_19111_1-p1773440.0003990.2744C0310482820
54Bn-A03-p90987732.74E-090.39173A038405389
55Bn-scaff_23799_1-p67821.32E-140.88single markerNA3NA3
56Bn-A03-p24913461.29E-090.24150A032036063
57Bn-A01-p240204512.95E-170.15single markerA0120087415
58Bn-A03-p7642745.12E-060.73141A03632475
59Bn-A02-p22877124.39E-101.09single markerNA3NA3
Stearic acid1Bn-A10-p94362055.05E-091.5single markerA1010869232qA10-2(Wang et al.2015)[11]
2Bn-A10-p19192933.42E-060.04607A101780462
3Bn-scaff_15747_1-p3960801.09E-070.85677C0114488446
4Bn-A01-p26886622.70E-057.7427A012194542qA1-5(Wang et al.2015)[11]
5Bn-scaff_17423_1-p1003180.0001190.48single markerA09NA3
6Bn-A01-p154971904.41E-05557A0112941895qA1-5(Wang et al.2015)[11]
7Bn-Scaffold000178-p335871.95E-071.45single markerA09NA3
8Bn-scaff_15794_1-p3473925.21E-4125.67775C0355942754qC3-3(Wang et al.2015)[11]
9Bn-A04-p165280104.43E-050.47288A0416689032
10Bn-A04-p173585197.05E-072.44single markerC0446402393
11Bn-A06-p1123394.94E-084.83364A06NA3
12Bn-Scaffold000217-p60258.40E-126.06361A0522747105
13Bn-scaff_16614_1-p3735131.71E-110.59single markerNA3NA3
14Bn-A03-p22970797.65E-081.94single markerA031863826
15Bn-A01-p280478721.18E-071.78single markerNA3NA3
16Bn-A08-p132398163.77E-0821.16510A0810991898qA8-5(Wang et al.2015)[11]
17Bn-scaff_15699_1-p5779144.02E-050.49509C0816728901
18Bn-A06-p238653562.19E-091.55420A0622856806
19Bn-A03-p89248521.16E-123.39173A038233061
Glucosinolate1Bn-scaff_15747_1-p1085962.11E-0613676C0114196095TN-q.mcG-C1d(Feng et al.2012)[9]
2Bn-scaff_19168_1-p316124.39E-051.4279C0136005339
3Bn-A04-p122594991.21E-050.24269A0413248240TN-q.mcG-A4c(Feng et al.2012)[9]
4Bn-A04-p139307131.36E-071.16single markerA04NA3
5Bn-scaff_15918_1-p2299873.20E-102.86722C0242160463TN-q.mcG-C2b(Feng et al.2012)[9]
6Bn-scaff_15794_1-p4378641.45E-2518.07774C0355837809TN-q.mcG-C3c(Feng et al.2012)[9]
7Bn-C14160250-p36872.05E-1917.14509A08NA3
8Bn-A03-p78380702.94E-073.97170A037130138 TN-cqS-Aro-GST-A3a(Feng et al.2012)[9]
Total/average151 1.94E-053.221133333    

1 More detailed information of each genetic bin is listed in S2 Table.

2 The physical position is presented by the start position of each SNP with unique position to the reference genome of B. napus, Darmor-bzh 4.1, and more information is also available in S2 Table.

3 Not available because of absent of alignment or multiple alignment positions.

The FDR (false-discovery rate) significance level is P<0.1for the detection of associated markers.

Manhattan plots based on composite interval QTL mapping for the six seed quality traits.

The x-axis represents the corresponding physical position of each SNP of the 13,678 SNPs across the genome from chromosome A01 to A10 and C01 to C09. Those markers without unique alignment to the reference genome were arranged in the axis noted as “not assigned”. The Y-axis represents the corresponding false-discovery rate (FDR) of each QTL indicating the significance for QTL calling. The PVE, i.e. proportion of the phenotypic variance explained by each QTL, is listed in Table 2. 1 More detailed information of each genetic bin is listed in S2 Table. 2 The physical position is presented by the start position of each SNP with unique position to the reference genome of B. napus, Darmor-bzh 4.1, and more information is also available in S2 Table. 3 Not available because of absent of alignment or multiple alignment positions. The FDR (false-discovery rate) significance level is P<0.1for the detection of associated markers. We used five-fold cross-validation to reliably estimate the potential of marker-assisted selection (MAS). The average accuracy of MAS ranged from 0.47 for protein content to 0.81 for erucic acid content (Table 3). These values were substantially lower compared to the non-cross-validated results (Table 2), underlining the need to validate findings of linkage mapping.
Table 3

Average prediction accuracy of four genomic selection methods and marker assisted selection (MAS) for six seed quality traits of the TN DH population.

Marker TypeMethodOil contentProtein contentErucic acidLinolenic acidStearic acidGluco-sinolatesAverage
13,678 SNPsRR-BLUP0.740.650.810.730.780.690.73
BayesCπ0.750.620.890.490.610.790.69
EG-BLUP0.720.620.790.740.750.670.72
GBLUP0.720.610.770.730.750.670.71
MAS0.520.470.810.510.740.640.62
1,527 representative SNPs*RR-BLUP0.760.660.830.750.810.720.76
BayesCπ0.760.640.880.450.610.790.69
EG-BLUP0.750.640.810.760.790.710.74
GBLUP0.760.640.820.750.80.720.75
MAS0.590.450.740.540.680.620.6
1,527 random SNPs*RR-BLUP0.720.630.810.720.770.690.72

*The 1,527 representative SNPs are specifically selected from the 1,527 individual genetic bins of the TN DH population, while the 1,527 random SNPs are randomly selected from the total 13,678 polymorphic SNPs of the TN DH population. Marker assisted selection (MAS) is based on markers significantly associated with the respective traits outlined in detail in the Material and Methods.

*The 1,527 representative SNPs are specifically selected from the 1,527 individual genetic bins of the TN DH population, while the 1,527 random SNPs are randomly selected from the total 13,678 polymorphic SNPs of the TN DH population. Marker assisted selection (MAS) is based on markers significantly associated with the respective traits outlined in detail in the Material and Methods.

Accuracies of genome-wide prediction in the TN DH population

We used four different models to investigate the efficiency of genome-wide prediction for the six seed quality traits. Genomic selection significantly showed higher prediction accuracies than MAS for all traits, with the most pronounced differences observed for linolenic acid, oil, and protein content (Table 3). The average prediction accuracy of RR-BLUP was the highest, while BayesCπ performed best for erucic acid and glucosinolate content. The most complex model comprising main and epistatic effects, EG-BLUP, performed best for linolenic acid content. In general, traits with high heritability could be predicted with higher accuracy compared to traits with low heritability. As expected for a bi-parental mapping population, a large number of markers were in tight LD and could thus be grouped into genetic bins because of the absence of recombination events. We reduced the co-linearity among markers and removed redundant markers in full linkage disequilibrium, resulting in a subset of 1,527 SNP markers (S2 Table, S2 Fig). Prediction accuracy increased on average by 3% using the reduced 1,527 representative marker set compared to genomic selection based on all SNPs (Table 3).

Effects of marker density, training population size, and number of environments on prediction accuracy

Genome-wide prediction based on RR-BLUP performed best on average and, in addition, was computationally efficient. Therefore, we conducted comprehensive analyses on the factors driving the accuracy in genome-wide prediction exclusively based on RR-BLUP. We varied the training population size and marker density and examined the accuracy of genome-wide predictions in our study. The accuracy remained in the range of 0.44 to 0.67 for all traits using only 48 lines as the training set (Fig 3). Interestingly, prediction accuracy reached a peak with 1,000 randomly selected markers and decreased only marginally for a subset of 100 markers. The prediction accuracy increased by ~4% for all six traits when using a representative set of markers compared to the 1,527 random evenly distributed markers (Table 2). Thus, our results indicated that to improve the accuracy of genome-wide prediction in a bi-parental population, the population size is more important than the density of markers.
Fig 3

Average prediction accuracy of genomic selection applying RR-BLUP based on (a) varying training population sizes and (b) number of markers.

Average prediction accuracy of genomic selection applying RR-BLUP based on (a) varying training population sizes and (b) number of markers. We further studied the effects of the number of environments and training population size on the accuracy of genomic selection by focusing on oil and protein content. The traits were selected because oil content was evaluated in a large number of 11 environments and protein content exhibited a high heritability. We randomly selected training sets comprising n = 48, 80, 112, and 144 lines evaluated for oil content evaluated in subsets of environments (k = 2, 3,…, 11 for oil content; k = 2, 3, 4, 5 for protein content). The accuracy was estimated as the Pearson moment correlation coefficient between predicted genotypic values and the adjusted entry means of all remaining lines evaluated across all environments. This type of cross-validation allows for the study of the prediction accuracy assuming reduced phenotyping intensity. As the test set was not evaluated in any of the environments, their performance could not be estimated by phenotypic correlations between environments. The prediction accuracies based on phenotypic data from only two environments were 0.73 for oil content and 0.60 for protein content (Fig 4). Compared to the accuracy evaluated with the full dataset, the accuracy decreased only in the range of 3% to 6%. The accuracy remained at 0.55 for oil content when only 48 lines and 2 environments were used.
Fig 4

Prediction accuracy of oil content and protein content using marker data for 1,527 representative SNPs according to different numbers of environments and training set size.

Accuracies of genome-wide prediction for seed oil content and protein content validated in a diverse population of 117 B. napus lines

A panel of 117 diverse lines was genotyped and phenotyped in one environment in order to validate the prediction accuracies of seed oil content and protein content. A total of 1148 common genetic bin markers across the AC genome, were screened for the two populations. Since we observed the highest accuracies for RR-BLUP in the TN DH population, we also used this method for prediction. The prediction accuracy amounted to 0.14 for protein content and 0.17 for oil content based on the genetic bin markers.

Discussion

Erucic acid, stearic acid, and glucosinolate content are promising targets for marker-assisted selection

Understanding the genetic basis of seed oil yield and quality is important for efficient rapeseed breeding [10]. Previous studies revealed differences in the complexity of the genetic architecture of the six quality traits examined in our study [1, 3, 9–11, 17, 20, 22, 72], which were further substantiated using five-fold cross-validations (Table 2; S3 Table). Oil, protein, and linolenic content are characterized by the absence of a reliable large-effect QTL, while erucic acid, stearic acid, and glucosinolate content are to a large degree controlled by a few QTL exhibiting large effects. For instance, the major QTL located in A08 and C03 (Table 2) totally explained 76.66% of the phenotypic variance for erucic acid, which has been widely identified previously in TN DH population and other mapping populations of B. napus [21, 22, 24, 73]. The major QTL located on C03 and explaining 16.44% of the phenotypic variance for oil content was identified in both of TN DH population and KN DH population [74]. The QTL with large genetic effects for total seed glucosinolates located in A08, C01 and C03, were also identified previously in this and other mapping populations [9, 19, 75]. These QTLs are interesting targets for marker-assisted selection, which can be applied in rapeseed breeding in combination with the enrichment of target alleles for F2 populations prior to producing DH populations. Besides of the consistent identified QTL, we also detected several new QTLs accounting for these seed quality traits with minor effects in TN DH population compared to the previous QTL identification in this population [10, 11, 22], which possibly because of the improved detection power using the high density SNP markers compared to the previous QTL identification using the relatively low-density markers. For example, the QTL “Bn-Scaffold000217-p20168” in C05, “Bn-scaff_16130_1-p1013445” and “Bn-scaff_16130_1-p1039452” located in C07 was newly identified for seed oil content of this population compared to that detected in Jiang et al., (2014). It is important to note that due to the absence of a physical position of 2,828 SNPs without alignment to the reference genome, we could not compare those QTL without unique physical position with previous studies.

Genetic architecture marginally impacts the choice of the genome-wide prediction model

Previous simulation studies revealed that equal shrinkage of marker effects as applied in RR-BLUP can be inappropriate for traits influenced by QTLs exhibiting large effects [13, 76]. In these cases, Bayesian models such as BayesB or BayesCπ, which allow specific shrinkage of every marker [77], are expected to outperform RR-BLUP. The superiority of BayesB over RR-BLUP has been reported for glucosinolate content in a previous genome-wide prediction study based on a diverse panel of 391 rapeseed lines derived from nine families [31]. Superiority of Bayes models versus RR-BLUP has also been observed for flowering time in the TN DH population [50]. In accordance with this observation, prediction accuracies for erucic acid and glucosinolate content were maximized when applying BayesCπ, with improvements of 8–10% compared to RR-BLUP (Table 2). In contrast, for stearic acid, RR-BLUP outperformed BayesCπ despite the presence of large-effect QTL. This is most likely due to two reasons. First, the ratio between the phenotypic variance explained by the two large-effect QTL versus that explained by the remaining small-effect QTL is approximately 1 to 1 for stearic acid content, while this ratio is 5 to 1 for erucic acid and glucosinolate content. Second, one large-effect QTL controlling stearic acid is reflected in several marker-trait associations with SNPs being in tight linkage disequilibrium (r2 >0.8), while the QTLs are reflected by only a limited number of SNPs for erucic acid and glucosinolate content. Epistasis, the interaction between genes [78], is an additional potential force influencing the choice of the biometrical model for genome-wide prediction [65]. Previous linkage and linkage disequilibrium mapping studies in rapeseed indicated that epistatic effects are involved in fatty acid metabolism [11, 47]. Consequently, we implemented EG-BLUP for genome-wide prediction, which explicitly considers digenic additive by additive epistatic effects [65]. We observed, however, higher prediction accuracies of EG-BLUP compared to the other genome-wide prediction models only for linolenic acid content (Table 2). Moreover, the gains in prediction accuracy were only marginal. These negligible benefits are in contrast to the non-cross-validated results of previous linkage and linkage disequilibrium studies [11, 47] and point to the strong need to validate the role of epistatic effects. In summary, the accuracy of genomic selection does not crucially depend on the choice of a suitable genome-wide prediction model and is an attractive alternative to marker-assisted selection.

Implementation of genome-wide prediction in rapeseed breeding

The successful implementation of genome-wide prediction in rapeseed breeding requires that a certain threshold of prediction accuracy is realized [40, 79]. Previous model studies in wheat and maize suggested a threshold for the prediction accuracy of 0.5 [80, 81]. We chose two important traits, oil content and protein content, to illustrate the size of the training population, the number of environments, and the marker density required to reach a prediction accuracy of 0.5 for the bi-parental population. In accordance with previous studies based on bi-parental populations [82-85], approximately one thousand markers were required before the prediction accuracy plateaued (Fig 3). Increasing the number of markers introduced problems due to collinearities. Prediction accuracies were higher for a reduced a set of 1,527 SNPs, which represented recombination loci in the population, in contrast to the full 13,678-marker set (Table 3). Thus, to improve the accuracy of genome-wide prediction in a bi-parental population, the population size indicating recombination events obtained is more important than the density of markers. The number of lines has a greater impact on the prediction accuracy than the number of environments (Figs 3 and 4). The prediction accuracy is already stagnating at three environments, and thus it is more efficient to invest in training population size. For protein content, approximately 144 lines evaluated in two environments were needed to reach an accuracy of 0.6. For oil content, prediction accuracy amounted to 0.6 when the training population was decreased to 80 lines and the number of environments reduced to two. These results suggest that genome-wide prediction can be successfully implemented in bi-parental populations even with small training population sizes and is an attractive complement to phenotypic selection to improve seed quality traits. The prediction accuracy within bi-parental populations is of central importance examining the potential to implement genome-wide prediction in breeding programs exploiting the double-haploid technology. Moreover, it is of interest to study the potential to use the prediction model also in unrelated populations. We examined an extreme validation scenario for the prediction of seed oil and protein content using a genetically diverse sample of 117 lines which were based on crosses between B. rapa and B. carinata accessions [55, 56]. The prediction accuracy in this independent and genetically very distinct validation population still amounted to 0.14 for protein content and 0.17 for oil content. While interpreting the prediction accuracies it has to be considered that the validation population exhibits genome segments from B. rapa/B. carinata. However, the used Brassica 60K-SNP array was developed based on the AC genome sequence of B. rapa, B. olearaca and B. napus. Thus, the lack of unique polymorphisms of B. carinata is expected to impair the prediction accuracies. Taking this into consideration, our independent validation reflects the high quality of the developed calibration models even in very diverse backgrounds highlighting the prospects of genome-wide prediction for routine rapeseed breeding programs.

Quantile-quantile plots of association mapping for six traits using different methods.

The green lines are the -log10 P-values of the linear regression method. The red lines are the -log10 P-values of the stepwise multiple linear regression method. The expected uniform distribution of negative -log10 P-values is indicated by the diagonal line in blue. (PDF) Click here for additional data file.

Decay of linkage disequilibrium with physical distance.

Within each physical distance class, marker pairs are clustered into five groups with varying r2 values. (JPEG) Click here for additional data file.

Locations, years and environments for the field experiment.

(DOCX) Click here for additional data file.

The physical alignment information of the SNPs of the TN DH population to the reference "Darmor-bzh" genome of B. napus.

(XLSX) Click here for additional data file.

Summary of the phenotypic data of six quality traits assessed in the TN DH population across environments.

(XLSX) Click here for additional data file.
  58 in total

1.  Measuring explained variation in linear mixed effects models.

Authors:  Ronghui Xu
Journal:  Stat Med       Date:  2003-11-30       Impact factor: 2.373

2.  Genome-based prediction of testcross values in maize.

Authors:  Theresa Albrecht; Valentin Wimmer; Hans-Jürgen Auinger; Malena Erbe; Carsten Knaak; Milena Ouzunova; Henner Simianer; Chris-Carolin Schön
Journal:  Theor Appl Genet       Date:  2011-04-20       Impact factor: 5.699

3.  QTL mapping based on the embryo and maternal genetic systems for non-essential amino acids in rapeseed (Brassica napus L.) meal.

Authors:  Juan Wen; Jian-Feng Xu; Yan Long; Jian-Guo Wu; Hai-Ming Xu; Jin-Ling Meng; Chun-Hai Shi
Journal:  J Sci Food Agric       Date:  2015-03-05       Impact factor: 3.638

4.  Reducing progoitrin and enriching glucoraphanin in Brassica napus seeds through silencing of the GSL-ALK gene family.

Authors:  Zheng Liu; Arvind H Hirani; Peter B E McVetty; Fouad Daayf; Carlos F Quiros; Genyi Li
Journal:  Plant Mol Biol       Date:  2012-04-03       Impact factor: 4.076

5.  Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers.

Authors:  José Crossa; Gustavo de Los Campos; Paulino Pérez; Daniel Gianola; Juan Burgueño; José Luis Araus; Dan Makumbi; Ravi P Singh; Susanne Dreisigacker; Jianbing Yan; Vivi Arief; Marianne Banziger; Hans-Joachim Braun
Journal:  Genetics       Date:  2010-09-02       Impact factor: 4.562

6.  Factors affecting accuracy from genomic selection in populations derived from multiple inbred lines: a Barley case study.

Authors:  Shengqiang Zhong; Jack C M Dekkers; Rohan L Fernando; Jean-Luc Jannink
Journal:  Genetics       Date:  2009-03-18       Impact factor: 4.562

7.  Zero erucic acid trait of rapeseed (Brassica napus L.) results from a deletion of four base pairs in the fatty acid elongase 1 gene.

Authors:  Gang Wu; Yuhua Wu; Ling Xiao; Xiaodan Li; Changming Lu
Journal:  Theor Appl Genet       Date:  2007-12-13       Impact factor: 5.699

8.  Dissecting the genetic architecture of frost tolerance in Central European winter wheat.

Authors:  Yusheng Zhao; Manje Gowda; Tobias Würschum; C Friedrich H Longin; Viktor Korzun; Sonja Kollers; Ralf Schachschneider; Jian Zeng; Rohan Fernando; Jorge Dubcovsky; Jochen C Reif
Journal:  J Exp Bot       Date:  2013-09-04       Impact factor: 6.992

9.  Agronomic and Seed Quality Traits Dissected by Genome-Wide Association Mapping in Brassica napus.

Authors:  Niklas Körber; Anja Bus; Jinquan Li; Isobel A P Parkin; Benjamin Wittkop; Rod J Snowdon; Benjamin Stich
Journal:  Front Plant Sci       Date:  2016-03-31       Impact factor: 5.753

10.  Genomic Prediction of Testcross Performance in Canola (Brassica napus).

Authors:  Habib U Jan; Amine Abbadi; Sophie Lücke; Richard A Nichols; Rod J Snowdon
Journal:  PLoS One       Date:  2016-01-29       Impact factor: 3.240

View more
  6 in total

1.  Incorporating pleiotropic quantitative trait loci in dissection of complex traits: seed yield in rapeseed as an example.

Authors:  Ziliang Luo; Meng Wang; Yan Long; Yongju Huang; Lei Shi; Chunyu Zhang; Xiang Liu; Bruce D L Fitt; Jinxia Xiang; Annaliese S Mason; Rod J Snowdon; Peifa Liu; Jinling Meng; Jun Zou
Journal:  Theor Appl Genet       Date:  2017-04-28       Impact factor: 5.699

Review 2.  Enhancing the Nutritional Quality of Major Food Crops Through Conventional and Genomics-Assisted Breeding.

Authors:  Kiran B Gaikwad; Sushma Rani; Manjeet Kumar; Vikas Gupta; Prashanth H Babu; Naresh Kumar Bainsla; Rajbir Yadav
Journal:  Front Nutr       Date:  2020-11-26

3.  Construction of a Quantitative Genomic Map, Identification and Expression Analysis of Candidate Genes for Agronomic and Disease-Related Traits in Brassica napus.

Authors:  Nadia Raboanatahiry; Hongbo Chao; Jianjie He; Huaixin Li; Yongtai Yin; Maoteng Li
Journal:  Front Plant Sci       Date:  2022-03-11       Impact factor: 5.753

4.  Genome-Wide Association Analysis Combined With Quantitative Trait Loci Mapping and Dynamic Transcriptome Unveil the Genetic Control of Seed Oil Content in Brassica napus L.

Authors:  Chuanji Zhao; Meili Xie; Longbing Liang; Li Yang; Hongshi Han; Xinrong Qin; Jixian Zhao; Yan Hou; Wendong Dai; Caifu Du; Yang Xiang; Shengyi Liu; Xianqun Huang
Journal:  Front Plant Sci       Date:  2022-07-01       Impact factor: 6.627

5.  Hybrid Performance of an Immortalized F2 Rapeseed Population Is Driven by Additive, Dominance, and Epistatic Effects.

Authors:  Peifa Liu; Yusheng Zhao; Guozheng Liu; Meng Wang; Dandan Hu; Jun Hu; Jinling Meng; Jochen C Reif; Jun Zou
Journal:  Front Plant Sci       Date:  2017-05-18       Impact factor: 5.753

6.  Finding invisible quantitative trait loci with missing data.

Authors:  Iulian Gabur; Harmeet S Chawla; Xiwei Liu; Vinod Kumar; Sébastien Faure; Andreas von Tiedemann; Christophe Jestin; Emmanuelle Dryzska; Susann Volkmann; Frank Breuer; Régine Delourme; Rod Snowdon; Christian Obermeier
Journal:  Plant Biotechnol J       Date:  2018-05-28       Impact factor: 9.803

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.