Literature DB >> 34735437

Prediction of heterosis in the recent rapeseed (Brassica napus) polyploid by pairing parental nucleotide sequences.

Qian Wang1, Tao Yan1, Zhengbiao Long1, Luna Yue Huang2, Yang Zhu1, Ying Xu1, Xiaoyang Chen3, Haksong Pak1, Jiqiang Li4, Dezhi Wu1, Yang Xu5, Shuijin Hua6, Lixi Jiang1.   

Abstract

The utilization of heterosis is a successful strategy in increasing yield for many crops. However, it consumes tremendous manpower to test the combining ability of the parents in fields. Here, we applied the genomic-selection (GS) strategy and developed models that significantly increase the predictability of heterosis by introducing the concept of a regional parental genetic-similarity index (PGSI) and reducing dimension in the calculation matrix in a machine-learning approach. Overall, PGSI negatively affected grain yield and several other traits but positively influenced the thousand-seed weight of the hybrids. It was found that the C subgenome of rapeseed had a greater impact on heterosis than the A subgenome. We drew maps with overviews of quantitative-trait loci that were responsible for the heterosis (h-QTLs) of various agronomic traits. Identifications and annotations of genes underlying high impacting h-QTLs were provided. Using models that we elaborated, combining abilities between an Ogu-CMS-pool member and a potential restorer can be simulated in silico, sidestepping laborious work, such as testing crosses in fields. The achievements here provide a case of heterosis prediction in polyploid genomes with relatively large genome sizes.

Entities:  

Mesh:

Year:  2021        PMID: 34735437      PMCID: PMC8608326          DOI: 10.1371/journal.pgen.1009879

Source DB:  PubMed          Journal:  PLoS Genet        ISSN: 1553-7390            Impact factor:   5.917


Introduction

Heterosis, which is a product of crossing two parents with different genetic backgrounds, is a common phenomenon in the biological world. The hybrid generation often displays more vigor, greater resistance to disease, better adaptability under stressful environments, and higher yield, when compared with the parents. Heterosis was first discovered in a tobacco hybridization experiment approximately 150 years ago, and it has been applied extensively for yield improvement in various field crops such as rice [1], corn [2], cotton [3], rapeseed [4], and some vegetables [5]. High-parent heterosis (HPH) and mid-parent heterosis (MPH) describe the degrees of phenotypic differences between a hybrid and its better parent and between a hybrid and the average of its two parents, respectively [6]. Numerous theories have been used to explain heterosis, and the major ones are the dominance and over-dominance hypotheses. The dominance hypothesis attributes the enhanced performance of hybrids to the repression of undesired recessive alleles of a parent by dominant favorite alleles of the other parent, and the poor performance of inbred lines to the loss of a diverse genetic basis, which is manifested by numerous homozygous loci [7]. Conversely, the over-dominance hypothesis attributes the superiority of heterozygotes to the survival of alleles that are recessive and harmful in homozygotes, and the poor performance of inbred lines to high proportions of such deleterious recessive alleles [8]. Dominance and over-dominance effects give rise to different gene expression profiles in offspring. Considering over-dominance is the main source of superiority in adaptability under heterosis, certain genes in heterozygous individuals could be overexpressed in comparison to their homozygous parents. However, in the case of dominance, fewer genes would be downregulated in heterozygous individuals compared with their parents. Based on such assumptions, greater heterosis would be generated with an increase in heterozygous loci. To obtain ideal hybrids, breeders have to generate high numbers of hybridization combinations and test their performance under multiple environments over time. Genomic selection (GS), a novel approach in which selection is not performed based on a few markers but on a genome-wide marker dataset, combines marker data with phenotypic and pedigree data (if available), and attempts to accurately predict the performance of the next generation rather than to identify individual loci that are significantly associated with a trait, with more rapid results and reduced costs in breeding activities. In addition, GS, which considers the entire genome sequences of parents as valuable breeding assessments and captures single-site effects even if they are minimal, can shorten the breeding cycle considerably, and save a lot of time and labor. At present, the major methods of developing GS include the genomic best linear unbiased prediction (GBLUP) method [9], and the least absolute shrinkage and selection operator (LASSO) method [10]. The Pearson correlation coefficient between the observed and predicted phenotypic values is often an indicator of the prediction ability [11,12]. Rapeseed (Brassica napus), a typical amphidiploid species, which originated from interspecific hybridization between Brassica rapa (AA, n = 10) and Brassica oleracea (CC, n = 9) only 7500 years ago [13], is one of the significant economic interests worldwide, providing high-quality oil with excellent health-promoting properties, and with a significant potential for non-food use such as biofuels and bioplastics. The yield and overall production of the crop have been increased significantly owing to the commercial use of hybrids in major rapeseed production areas, such as Canada, China, and Europe. As for many other crops, the strategy of choice for large-scale commercial production of hybrids relies on the development of cytoplasmic male-sterile parents, which fertility can be restored when crossed to another parent carrying a restorer-of-fertility gene. In rapeseed, the Polima cytoplasmic male sterile (Pol-CMS) system is widely used in semi-winter Chinese ecotype breeding [14]. Although it is relatively easy to find restorers for Pol-CMS lines, male sterility in the Pol-CMS type lines is unstable under certain environmental conditions. However, the Ogura Cytoplasmic Sterile (Ogu-CMS) is much more stable and complete; nevertheless, finding restorers for Ogu-CMS lines is challenging, and it takes several years to transfer restoring genes into potential restorers. In this study, we successfully bred a series of Ogu-CMS restorers and constructed a pool of Ogu-CMS lines, which reflects the genetic diversity of the semi-winter ecotype [15]. We applied GS and developed models to predict the heterosis by pairing genome-wide nucleotides of parents. Maps with an overview of quantitative trait loci for heterosis (h-QTLs), at which parental genetic similarity index (PGSI) positively or negatively correlated with the heterosis of a specific trait, were drawn. With the GS-based predictive models that we elaborated, combining abilities between an Ogu-CMS-pool member and one of 1007 potential restorers could be tested in silico by pairing the nucleotide sequences of parents. This will fasten rapeseed breeding by saving years of effort, and provide a case of study of heterosis prediction in polyploid genomes with relatively large genome size.

Results

Heterosis of F1 hybrids of the Ogu-CMS system

We developed an Ogu-CMS pool consisting of 50 members in addition to eight Ogu-restorers for this experiment. The identifications (ID) and relevant information of the CMS and restorer lines are provided in S1 Table. The 50 Ogu-CMS lines constituted an Ogu-CMS pool that had a wide genetic diversity reflected by a principal component analysis (PCA) based on 1,057 sequenced genomes, including those of the CMS lines used in the present study. The lines represent the semi-winter ecotype in the background of a worldwide germplasm collection (S1 Fig) [15]. Crosses between the 50 CMS lines and the eight restorers yielded 400 hybrids. The hybrid lines were grown under three environmental conditions from 2017 to 2019. They demonstrated significant heterosis in terms of HPH and MPH across various agronomic traits such as plant height (PH), the number of seeds per silique (NSS), and grain yield (GY). The phenotypic data and the genetic relationship between parents and offspring are provided in S2 and S3 Tables. In addition, they exhibited significant MPH across traits such as number of branches per plant (NBP), number of siliques per plant (NSP), and thousand-seed weight (TSW) (Fig 1). GY-HPH and GY-MPH values of the top 10% hybrid lines were 90.16% and 146.33%, respectively, whereas, the GY-HPH and GY-MPH values of the top 1% hybrid lines were as high as 168.81% and 233.01%, respectively (S4 Table).
Fig 1

Comparison of agronomic-trait performance between the F1 hybrids and their parents.

PH: plant height, NBP: number of branches per plant, NSP: number of siliques per plant, NSS: number of seeds per silique, TSW: thousand-seed weight, and GY: grain yield. P1 represents female parents, and P2 represents male parents. The values indicate the significance of pairwise comparisons.

Comparison of agronomic-trait performance between the F1 hybrids and their parents.

PH: plant height, NBP: number of branches per plant, NSP: number of siliques per plant, NSS: number of seeds per silique, TSW: thousand-seed weight, and GY: grain yield. P1 represents female parents, and P2 represents male parents. The values indicate the significance of pairwise comparisons. To estimate the influence of parents on heterosis, we calculated the correlation coefficients between the phenotypic values of the parents and those of the hybrids. For most traits, the correlations between the hybrids and their male parents were higher than those between the hybrids and their female parents (S5 Table). Notably, the correlation coefficient of PH between the hybrids and male parents was the highest (r = 0.64), indicating a higher impact of the male parents on the PH of the hybrids. Furthermore, we compared the correlations among the six agronomic traits of the hybrids. There were relatively high positive correlations between GY and NSP (r = 0.63), NSP and NBP (r = 0.46), NSS and PH (r = 0.31), and relatively high negative correlations between TSW and NSS (r = -0.31), TSW and NSP (r = -0.15), and NSP and NSS (r = -0.21) (Fig 2B and S6 Table). Among the six traits, NSP had the greatest correlation with GY, suggesting a considerable influence of silique number on yield heterosis in rapeseed (Fig 2A). Overall, the heterosis of the hybrids in the semi-winter ecotype with the Ogura system was significant and attractive.
Fig 2

Correlations between the phenotypic value of agronomic traits (left) and the contribution of a trait to grain yield (right). PH: plant height, NBP: number of branches per plant, NPB: number of primary branches per plant; NSB: number of secondary branches; NSP: number of siliques per plant, NSS: number of seeds per silique, TSW: thousand-seed weight, GY: grain yield. (A) Contribution of a specific trait to the grain yield. R represents the correlation coefficient and P represents significant values. (B) Pairwise correlations among the phenotypic values of agronomic traits. The sectors indicate the positive or negative values of the correlations. The darker the sectors, the greater the absolute values. The number inside the box represents the correlation coefficient. Changes in color from dark red to dark blue correspond to changes in correlation coefficient from -1 to +1.

Correlations between the phenotypic value of agronomic traits (left) and the contribution of a trait to grain yield (right). PH: plant height, NBP: number of branches per plant, NPB: number of primary branches per plant; NSB: number of secondary branches; NSP: number of siliques per plant, NSS: number of seeds per silique, TSW: thousand-seed weight, GY: grain yield. (A) Contribution of a specific trait to the grain yield. R represents the correlation coefficient and P represents significant values. (B) Pairwise correlations among the phenotypic values of agronomic traits. The sectors indicate the positive or negative values of the correlations. The darker the sectors, the greater the absolute values. The number inside the box represents the correlation coefficient. Changes in color from dark red to dark blue correspond to changes in correlation coefficient from -1 to +1.

Correlation between parental genetic similarity index (PGSI) and F1 heterosis

To determine the mechanism by which parental-sequence similarity potentially influences hybrid vigor, we calculated the correlations between the genome-wide PGSI and heterosis. A total of 4.44 million SNPs were obtained across the paring genomes by mapping reads to the reference genome [13]. Overall, PGSI negatively influenced GY, NSS, NSP, and PH, but positively influenced TSW, regardless of the type of heterosis (HPH and MPH) (Fig 3). The absolute values of the correlation confidence between PGSI and NSS-HPH, TSW-HPH, PH-MPH, and NSS-MPH were relatively high (S7 Table).
Fig 3

Effects of parental genetic similarity index (PGSI) on heterosis of traits.

The pink and green squares represent negative and positive effects, respectively. The darker the squares, the larger the absolute value of the correlation coefficients. (A) Effect of PGSI on high-parent heterosis (HPH). (B) Effect of PGSI on mid-parent heterosis (MPH).

Effects of parental genetic similarity index (PGSI) on heterosis of traits.

The pink and green squares represent negative and positive effects, respectively. The darker the squares, the larger the absolute value of the correlation coefficients. (A) Effect of PGSI on high-parent heterosis (HPH). (B) Effect of PGSI on mid-parent heterosis (MPH). As rapeseed (Brassica napus) is a typical polyploid species, we compared the influence on heterosis between the A and C subgenomes. In general, the influence from the C subgenome was greater than the influence from the A genome on heterosis across the six traits. Furthermore, we compared the influences between the 19 chromosomes making up the whole genome. The PGSIs of C01, A03, C05, C06, C02, A06, C09, A01, C04, A07, C08, A02, A09, A04, C03, A10, C07, A05, and A08 had influences (in order from high to low) on HPH, respectively. The PGSIs of C05, A03, C04, C01, C03, A07, C09, C02, C06, A09, A01, A06, A10, C08, A02, A05, A04, A08, and C07 had influences (in order from high to low) on MPH, respectively (S8 Table). Here, the influence was calculated by stacking up the absolute values of the correlation coefficients, where positive correlations meant that the higher the PGSI, the smaller the heterosis, negative correlations indicated that the lower the PGSI, the greater the heterosis. The absolute value indicates the degree of impact.

Genomic regions where parental genetic similarity impacts on heterosis of the Ogura hybrids

We calculated PGSI and performed LASSO analysis to identify the genomic regions responsible for the heterosis of traits, which were defined as h-QTLs. 172 h-QTLs were associated with GY-HPH (Fig 4 and S9 Table). Some h-QTLs had a relatively greater effect on heterosis, as shown with the darker colors in Fig 4. The darker the colors of circles or triangles, the greater the impacts on heterosis, either positive or negative. We also observed 130, 60, 102, 111, and 104 h-QTLs associated with TSW-HPH, NSS-HPH, NSP-HPH, NBP-HPH, and PH-HPH, respectively (S2–S6 Figs and S10–S14 Tables). Consistent with the results illustrated in Fig 3, the C subgenome had a higher impact on the heterosis, accounting for 57.0%, 56.6%, 62.7%, 62.7%, 63.1%, and 62.5% of the h-QTLs responsible for GY-HPH, TSW-HPH, NSS-HPH, NSP-HPH, NBP-HPH, and PH-HPH, respectively.
Fig 4

h-QTLs responsible for GY-HPH.

The circles represent the h-QTLs that positively contributed to the GY-HPH, and the triangles represent h-QTLs that were negatively correlated to GY-HPH. The darker the colors of circles and triangles, the greater the effects of the h-QTLs, either positive or negative. The colors on the chromosomes indicate the density of genes. The darker the blue, the lower the gene density, the darker the red, the higher the gene density. A and C stand for the two sub-genomes of Brassica napus. A limited number of h-QTLs on randomly piled contigs, whose positions on certain chromosomes were unknown, are not shown on the map. A positive effect indicated with a circle on maps means the smaller the PGSI, the great the heterosis, whereas, a negative effect tagged with a triangle means the bigger the PGSI, the greater the heterosis.

h-QTLs responsible for GY-HPH.

The circles represent the h-QTLs that positively contributed to the GY-HPH, and the triangles represent h-QTLs that were negatively correlated to GY-HPH. The darker the colors of circles and triangles, the greater the effects of the h-QTLs, either positive or negative. The colors on the chromosomes indicate the density of genes. The darker the blue, the lower the gene density, the darker the red, the higher the gene density. A and C stand for the two sub-genomes of Brassica napus. A limited number of h-QTLs on randomly piled contigs, whose positions on certain chromosomes were unknown, are not shown on the map. A positive effect indicated with a circle on maps means the smaller the PGSI, the great the heterosis, whereas, a negative effect tagged with a triangle means the bigger the PGSI, the greater the heterosis. In the present study, all the variables in the LASSO prediction model were defined as h-QTLs, and the top 10% and the bottom 10% regression coefficients of the variables were defined as high-impact h-QTLs. All h-QTLs for a specific trait were displayed on the maps and the underlying genes responsible for high-impact h-QTLs for investigated. There were 34 high-impact h-QTLs for GY-HPH according to the definition. The more heterozygous the h-QTLs such as Chr.C06-04, Chr.C08-01, the higher the GY-HPH. Conversely, the more homozygous the h-QTLs, such as Chr.C08-02 and Chr.C03-08, the higher the GY-HPH. The candidate genes covered by the high-impact h-QTLs for GY-HPH, PH-HPH, TSW-HPH, NSS-HPH, NSP-HPH, and NBP-HPH are listed in S15–S20 Tables.

Prediction of heterosis in a training population containing 400 hybrids via cross-validation

To predict heterosis of the hybrids with the Ogu-CMS system, we performed ten-fold cross-validation with 100 replicates in a training population containing 400 hybrids produced by 50 Ogu-CMS lines and eight restorers. To identify the optimal model for prediction, we compared the predictabilities of various models, namely GBLUP_A, GBLUP_AD, LASSO_SNP, LASSO_100Kb, LASSO_500Kb, and LASSO_1Mb. Parameters from ANOVA for all the predictions were listed in S21 Table. All predictabilities were greater than 0.6 (Table 1). Predictability varied across traits. For example, PH was the most predictable trait across all models. In addition, the predictability of HPH was higher than that of MPH across most traits, excluding NSS (Table 2).
Table 1

Comparison of the predictability of high-parent heterosis (HPH) of six traits among six models.

MethodsGYTSWNSSNSPNBPPH
GBLUP_A0.71650.71100.66860.72180.75430.8815
GBLUP_AD0.71650.72510.68550.72180.75760.8834
LASSO_SNP0.80000.79870.76160.80620.82460.9165
LASSO_1Mb0.88280.86710.79900.82600.86970.9370
LASSO_500Kb0.86490.88420.85160.85440.87450.9646
LASSO_100Kb0.83790.87290.88370.88160.92460.9754

Notes: PH, plant height; NBP, number of branches per plant; NSS, number of seeds per silique; NSP, number of siliques per plant; TSW, thousand-seed weight; GY, grain yield.

Table 2

Comparison of the predictability of mid-parent heterosis (MPH) of six traits among six models.

MethodsGYTSWNSSNSPNBPPH
GBLUP_A0.63100.64630.71930.71360.62360.7808
GBLUP_AD0.63100.65060.72300.71360.62760.7811
LASSO_SNP0.74160.75500.80000.80000.74160.8485
LASSO_1Mb0.77330.80710.86980.81990.83740.8928
LASSO_500Kb0.78980.81340.92030.83420.81170.9240
LASSO_100Kb0.79020.83520.88060.86700.82290.9490

Notes: PH, plant height; NBP, number of branches per plant; NSS, number of seeds per silique; NSP, number of siliques per plant; TSW, thousand-seed weight; GY, grain yield.

Notes: PH, plant height; NBP, number of branches per plant; NSS, number of seeds per silique; NSP, number of siliques per plant; TSW, thousand-seed weight; GY, grain yield. Notes: PH, plant height; NBP, number of branches per plant; NSS, number of seeds per silique; NSP, number of siliques per plant; TSW, thousand-seed weight; GY, grain yield. Generally, the LASSO models demonstrated higher predictabilities than the GBLUP models. The GBLUP_AD model did not exhibit significantly higher predictability than the predictability of the GBLUP_A model. Among the four LASSO models, LASSO_SNP had the lowest predictability values across the six traits, regardless of the heterosis definition (HPH or MPH), indicating the necessity for reducing dimension in the calculations. In terms of HPH, the optimal model for NSS, NSP, NBP, and PH was LASSO_100Kb, and the optimal models for GY and TSW were LASSO_1Mb and LASSO_500Kb, respectively (Table 1). In terms of MPH, the optimal model for GY, TSW, NSP, and PH was LASSO_100Kb, and the optimal models for NSS and NBP were LASSO_500Kb and LASSO_1MB, respectively (Table 2). Overall, according to the results, an appropriate model should be selected to predict the heterosis of a specific trait. The LASSO_100Kb model was acceptable for the prediction of heterosis in all six traits (Fig 5).
Fig 5

Comparison of predictability of LASSO_100Kb model for high-parent heterosis (HPH) and mid-parent heterosis (MPH) among six agronomic traits.

Different letters indicate a significant difference (p = 0.01) between a comparison pair. PH: plant height, NBP: number of branches per plant, NSP: number of siliques per plant, NSS: number of seeds per silique, TSW: thousand-seed weight, and GY: grain yield.

Comparison of predictability of LASSO_100Kb model for high-parent heterosis (HPH) and mid-parent heterosis (MPH) among six agronomic traits.

Different letters indicate a significant difference (p = 0.01) between a comparison pair. PH: plant height, NBP: number of branches per plant, NSP: number of siliques per plant, NSS: number of seeds per silique, TSW: thousand-seed weight, and GY: grain yield.

Further validation of heterosis prediction models

We adopted the LASSO_100Kb, LASSO_500Kb, and LASSO_1Mb models to predict the heterosis of a 100-hybrid population generated by the Ogu-CMS-pool members and two independent restorers. The predicted and actual values observed in fields were analyzed to determine the predictability (Fig 6 and S22 Table). For HPH, the correlation coefficients between the predicted and actual values were 0.84, 0.68, 0.66, 0.52, 0.64, and 0.65 for PH, NBP, NSP, NSS, TSW, and GY, respectively. For MPH, the correlation coefficients between the predicted and actual values were, conversely, 0.73, 0.36, 0.51, 0.62, 0.61, and 0.41 for PH, NBP, NSP, NSS, TSW, and GY, respectively. The predictability of PH was the highest, regardless of heterosis definition. As illustrated by the red and blue colors in Fig 6, the model could successfully indicate a superior restorer based on the performances of some certain traits. For example, Restorer No. 9 was superior to Restorer No. 10 in PH-HPH and TSW-HPH; conversely, Restorer No. 10 was superior to Restorer No. 9 in NSS-HPH. However, the GY-HPH, NBP-HPH, and NSP-HPH depended on specific combinations between the restorers and the Ogu-CMS-pool members (S1 Table), and it was hard to tell which restorer was better in yielding higher GY-HPH, NBP-HPH, and NSP-HPH. Nevertheless, the combinations for the highest GY-HPH, NBP-HPH, and NSP-HPH could be recommended based on the result.
Fig 6

Fitness between the predicted high-parent heterosis (HPH) and the actual observed HPH of the testing population containing 100 hybrid lines.

Each dot represents one of the 100 hybrid lines that arose from a cross between an Ogu-CMS-pool member and two restorers independent from the training population. The red and blue colors represent restorer 9 and 10, respectively. The grey areas indicate 95% confidence intervals.

Fitness between the predicted high-parent heterosis (HPH) and the actual observed HPH of the testing population containing 100 hybrid lines.

Each dot represents one of the 100 hybrid lines that arose from a cross between an Ogu-CMS-pool member and two restorers independent from the training population. The red and blue colors represent restorer 9 and 10, respectively. The grey areas indicate 95% confidence intervals.

Discussion

As a promising new breeding method, GS has been applied to the prediction of heterosis of various crops such as rice [11,12,16], corn [17], barley [18], wheat [19], ryegrass [20], and pumpkin [21]. The traits which were predicted in the different studies were not only limited to yield and yield components [16-21], but also those traits such as biotic- and abiotic-stress tolerances [22,23], nutrient utilization efficiency [24,25]. Compared with the previous GS studies on other crops, our research had the following characteristics. First, the genetic information used in our study involves 4.44 million SNPs, which were 2–3 orders of magnitude more than the number of molecular markers used in the previous studies [16-25]. The previous studies either involved resequencing data of the crops with much smaller genomes such as rice [11,12,16], or only a small part of genome-wide SNPs of the crops with larger genomes such as barley [18] and corn [17]. We used deep resequencing (>30x) data of the entire polyploidy rapeseed genome, providing a successful case for the prediction of heterosis in polyploid genomes with relatively large genome sizes. More SNP markers tend to imply a more comprehensive level of genome coverage and indicate the involvement of more genetic information. However, a higher number of SNPs does not always mean higher predictability. Previous studies showed that the accuracy of prediction increases with more molecular markers within a specific region, but it reaches a peak after which increasing the density of markers is no longer beneficial for prediction accuracy [26-31]. In addition, there is a relationship between the number of markers required and the degree of linkage disequilibrium (LD) of the species. The more rapidly a species declines in LD, the smaller the LD distance, and more molecular markers would be required within the same size of the chromosomal fragment. The LD distances of the populations used in our study were less than 100Kb (S7 Fig), which is consistent with the previous study [15]. Second, instead of the direct use of SNPs, we introduced the concept PGSI to drastically reduce the dimension of the calculation matrix. The predictability was, therefore, increased to 0.8828 for GY-HPH and even 0.9754 for PH-HPH, which were much higher than those in previous reports were [16-24]. When we calculated the PGSI, we simply divided the chromosomes into 1Mb-, 500Kb-, and 100Kb-sizes, respectively, without considering local LD and gene numbers within the regions, which was technically complicated. LD is a concept of population genetics, meaning the non-random association of alleles at different loci. It is meaningful to calculate LD for a given population but is practically difficult to calculate PGSI based on LD because the LD distance of two parental genomes does not always match with each other, and an LD distance for an individual genotype was rather difficult to be determined. Third, we created the concept of h-QTLs and drew maps with overviews of h-QTLs across the genome, at which PGSI positively or negatively associated with the heterosis as we showed for several agronomic traits. The h-QTL maps published in our study predict that the genomic regions (with a dark color) would exert a high impact on trait-specific heterosis, meaning that the PGSI of those regions strongly correlates with the heterosis. We would suggest that those regions be very much considered in a GS breeding for heterosis. Since the LD distance of the rapeseed population is less than 100kb (S7 Fig), and an h-QTL of 100 Kb in size might span across two LD fragments on average. There must be a few genes that were responsible for a kind of heterosis despite the ‘false’-gene majority (S15–S20 Tables). Most allelic changes might not associate with trait-specific heterosis. However, it is beyond the scope of this paper to identify the major functional genes of each h-QTL. Hybrid vigor was well demonstrated here in semi-winter rapeseed ecotype based on the Ogu-CMS system, with the top 10% hybrids displaying 90.2% GY-HPH on average. Among the major yield components, NSP had the greatest correlation with GY, which is consistent with previous findings [4]. TSW was negatively correlated with NSS and NSP, and NSP was negatively correlated with NSS, since the traits, which are both limited by photosynthate allocation, counteract each other. Except for PH value, higher values of the traits such as TSW, NSS, NSP, and NBP were better for GY-HPH. Higher PH may give rise to higher biomass, which would positively affect yield. However, higher PH would not be always better for GY-HPH, e.g. lodging caused by plant height could negatively influence the final yield. In the present study, there was no correlation between PH-HPH and GY-HPH, which could be attributed to relatively low lodging during the seasons when the field experiments were conducted. The result shows that the male parents would affect the hybrids more than the female parents on the height of F1 plants. One of the possible reasons could be the much smaller size of the restorer-line (male) population than that of the CMS-line (female) population. The results of our study showed that a high overall PGSI would lead to high TSW; conversely, a low PGSI would favor a high NSS. To our knowledge, these interesting findings were not reported in other field crops. The commercial use of the Ogu-CMS system for the semi-winter rapeseed ecotype would be a breakthrough in rapeseed production in the Yangtze River Basin because of the advantages of this CMS system in the form of stability and complete male sterility [32]. Most rapeseed genotypes can serve as Ogu-CMS line maintainers; however, the process of breeding an Ogu-restorer is slow and laborious [33]. In the present study, we established a pool of 50 Ogu-CMS lines, which represents the genetic diversity of the semi-winter rapeseed ecotype (S1 Fig). Models that could predict the heterosis of F1 plants from the crosses between the Ogu-CMS-pool members and potential restorers were developed. Using the models, we could test the combining ability between an Ogu-CMS-pool member and any potential restorer by pairing the nucleotide sequences in silico. Such a tool could bypass otherwise arduous manual work such as testing crosses in the field. Breeding efforts could be, therefore, focused on the transfer of restorer genes to a limited number of candidates, which are often achieved by backcrossing processes that usually take several years. To facilitate such applications, we established BnaSNPDB, an interactive web portal for efficient retrieval and analysis of Single Nucleotide Polymorphisms (SNPs) of 1,057 rapeseed germplasm accessions (https://bnapus-zju.com/bnasnpdb) [34]. SNPs of a genotype can be easily retrieved for in silico pairing with SNPs of an Ogu-CMS-pool member to simulate hybrid vigor. To validate the accuracy of the model, we created a test population using the same 50 Ogu-CMS lines and two independent restorer lines. The results of the present study show that different models are suitable for predicting different traits or heterosis based on respective definitions. For example, the LASSO_100Kb model is suitable for predicting NSS-HPH, NSP-HPH, NBP-HPH, and PH-HPH, whereas the LASSO_500Kb and LASSO_1Mb models are suitable for predicting TSW-HPH and GY-HPH. PH heterosis predictability was the highest among the six agronomic traits explored since PH could be more accurately measured than other traits, in which the errors were more challenging to control. Numerous studies have compared the predictability of heterosis across various models. However, debate persists regarding the optimal method for predicting heterosis [11,16,35]. In the present study, the four LASSO methods were superior to the two GBLUP methods in terms of heterosis predictability. LASSO regression was characterized by variable selection and regularization of complexity while fitting the generalized linear model. Variable filtering is essential for LASSO, which means not inputting all variables into the model for fitting, but selectively inputting variables into the model to obtain better performance parameters. Complexity adjustment controls the complexity of the model through a series of parameters to avoid overfitting. For a linear model, complexity is directly related to the number of variables in the model. The more the variables, the higher the model complexity. Although including more variables in the fitting could often lead to a superior model, there is a risk of overfitting. In general, overfitting is possible when the number of variables is much greater than the number of data points, or when a discrete variable has too many unique values. In our study, there was no significance between LASSO_A and LASSO_AD in predicting the heterosis. This may be because the kinship matrix of additive effect has already captured much information about the kinship matrix of dominant effect. The dominance effect did not, therefore, play a significant role in accounting for the rest of the variances. The least predictability of the LASSO_SNP model could have arisen from the fact that we used SNP markers as variables. Too numerous variables could have led to overfitting, which, in turn, minimized heterosis predictability. In general, there is a higher degree of genetic diversity in the A subgenome than in the C subgenome in large genetic populations, which might be caused by the fact that the A subgenome integrates part of chromosome segments from the Brassica rapa genome through interspecific hybridization with B. rapa [15]. Moreover, evolutionary studies demonstrated that the genetic diversity of natural populations in two ancestors of rapeseed varies greatly, with higher genetic diversity in the natural populations of Brassica rapa than in natural populations of Brassica oleracea [36]. With the above facts in mind, it was at first glance strange that there were more h-QTLs distributed on the C subgenome than the A subgenome. One possible reason could be that the genetically diverse regions in terms of SNP abundance might not be those functional regions, as there was a biased expression of functional genes between the two subgenomes. Further, the allelic variations on the genetically conserved C subgenome, not those ‘wild’ alleles on the genetically diverse A subgenome, were more valuable to cause F1 heterosis. The knowledge about the asymmetric distribution of h-QTLs suggests the selection of parents with a more allelic variation on C genomes which are valuable for F1 heterosis. The results do not imply that all forms of heterosis resulted from the h-QTLs with low PGSI. On contrary, PGSI should be high, at Chr.C03-No.08, Chr.C04-No.04, Chr.C07-No.02, Chr.C04-05, and Chr.C08-05 to achieve high GY-HPH, NSS-HPH, NSP-HPH, NBP-HPH, and PH-HPH, respectively (S9–S14 Tables). Heterosis has been proposed as an alternative term for ‘heterozygosis’ to avoid limiting the term to the effects that would only be explained based on heterozygosity according to Mendelian inheritance principles [37]. Heterozygosity between parents does not always give rise to hybrid vigor. Genetic incompatibility between parents could reduce fitness via a form of ‘outbreeding depression’ [38,39]. We adopted regression coefficients of the variables of the regression models as the criteria for selecting h-QTLs. Numerous h-QTLs responsible for GY-, TSW-, NSS-, NSP-, NBP-, and PH-HPH, respectively, were identified and illustrated (Figs 4 and S2–S6). The candidate genes responsible for high-impact h-QTLs were suggested (S15–S20 Tables). The field experiment with 400 hybrids in three replicates was not a very small scale for rapeseed. Moreover, the CMS pool consisting of maternal lines was genetically diverse, the population that contained 400 hybrids demonstrated a wide range of heterosis in all the six agronomic traits investigated. In conclusion, we demonstrated in this study the high heterosis of F1 hybrids in semi-winter rapeseed ecotype using the Ogu-CMS system and implemented GS-based models for prediction of heterosis and identification of heterotic parental combinations. PGSI negatively influenced GY, NSS, NSP, and PH, but positively influenced TSW. The C subgenome had a greater impact on heterosis than the A subgenome in the polyploidy genome of B. napus. We went a step further and drew maps showing overviews of h-QTLs across the genome, at which PGSI positively or negatively associated with GY- HPH, TSW- HPH, NSS-HPH, NSP- HPH, NBP- HPH, and PH-HPH, and listed the IDs of the genes underlying h-QTLs. Using the GS-based prediction models, combining abilities between an Ogu-CMS-pool member and a potential restorer can be tested in silico by pairing the nucleotide sequences of parents. Such models could sidestep laborious work, such as testing crosses in fields while facilitating breeding efforts via the transfer of restorer genes to a restorer candidate. The achievements here provide a case of heterosis prediction in polyploid genomes with relatively large genome sizes.

Materials and methods

Definition of high and mid parent heterosis

High and mid-parent heterosis were calculated according to the formula below. , where, HPH stands for high parent heterosis; F1 is the phenotypic value of the F1 hybrid; HP represents the phenotypic value of the high parent. , where, MPH stands for mid-parent heterosis; F1 is the phenotypic value of the F1 hybrid; MP means the average phenotypic value of the parents.

Construction of the Ogu-CMS pool

Semi-winter genotypes that represent the genetic diversity of cultivars in the Yangtze River Basin were carefully selected to develop Ogu-CMS lines. Genomes of the CMS lines were deeply (30×) sequenced and analyzed. Their genetic diversity was analyzed in the background of a worldwide germplasm collection consisting of 1,057 accessions [39]. After principal component analysis (PCA), 50 CMS lines were selected for the construction of the Ogu-CMS pool. PCA was performed using the smartPCA program in the EIGENSOFT package (https://github.com/DReichLab/EIG; v.6.0.1). Different ecotype samples were separated by two principal components (PCs), that is, the winter type was separated from the semi-winter and spring ecotypes by PC1, while the semi-winter type was separated from winter and spring ecotypes by PC2.

Genome resequencing

DNAs of 50 Ogura CMS lines and 10 restorers were extracted and sequenced using a previously described method [15]. Genomic DNA was extracted from young leaves using a cetyltrimethylammonium bromide-based protocol. A NanoDrop2000 spectrophotometer (Thermo Fisher Scientific) was used to determine the quality and concentrations of the genomic DNA. DNA libraries were constructed for each line for Illumina sequencing (Illumina, California, USA) according to the manufacturer’s (Biomarker Technologies Cooperation, Beijing, China) instructions. Following DNA-library construction, the accessions were resequenced on an Illumina HiSeq XTen (Illumina, California, USA) platform using a commercial service, with a 150-bp read length. In total, 2,862-Gb high-quality sequences were obtained. All clean reads were mapped to the ‘Darmor-bzh’ reference genome [13], resulting in a 38-fold coverage and a 99.2% mapping rate on average. SNPs and InDels within the 60 accessions were called using the HaplotypeCaller module in GATK [40] and were filtered based on parameters applied in a previous study [15].

Definition of parental genetic similarity index

The PGSIs of each cross were calculated using 1-Mb, 500-Kb, and 100-Kb window widths, and the entire genome could be divided into 873, 1722, and 8522 blocks, respectively, based on the window widths. For each block, the sites where parents had the same single nucleotides were marked as ‘2’. The sites where one parent had a nucleotide similar to the reference but the other parent had a different nucleotide were marked as ‘1’. The sites where both parents had different nucleotides from the reference were marked as ‘0’. The PGSI of a block was two times the value obtained by accumulating the marks and then dividing them by the number of loci available for computation.

Phenotyping and phenotypic data analysis

The 50 Ogu-CMS lines were used as female parents, and the eight restorer lines were used as male parents to produce 400 hybrids based on an incomplete double-cross design. Another 100 hybrid lines were produced between the Ogu-CMS-pool members and two restorers, independent of the 400-hybrid training population. The training and testing populations were grown in Zhangye, Gansu Province (100°85E, 38°43N) in 2017, Hangzhou, Zhejiang Province (120°19E, 30°26N) in 2018, and Huzhou, Zhejiang Province (119°91E, 30°01N), in 2019. The phenotype values of six agronomic traits including PH, NBP, NSP, NSS, TWS, and GY were measured. The experiments were based on a randomized-complete-block design with three replicates. At least three plants were sampled for each genotype in each replicate. For the NSS trait, 30 siliques from the main inflorescences of each plant were harvested and counted to determine the number of seeds in each silique. To facilitate the subsequent analyses, phenotypic values from the three environments were integrated according to a linear mixed model as follows: = 1μ+++ where is the vector of the mean value of each genotype in each environment calculated in the first step; r is the sum of the number of genotypes measured in each environment and 1 is an r-dimensional vector of 1’s; μ is the common intercept; is the vector of genotypic effects of all genotypes and is the corresponding design matrices for ; is the vector of environmental effects and is the corresponding design matrix for . The genotypic effect was assumed to be a fixed value to gain the best linear unbiased estimation (BLUE) of each genotype across environments. The BLUE value of each genotype was used to perform all the analyses in the study. All linear mixed models were implemented using the lme4/R program [41].

Prediction methods

Two parametric methods, GBLUP and LASSO, were applied to predict heterosis. The general model of the two parametric methods that include all m markers is described as follows: , where y is an n ×1 vector of the phenotypic values for each trait; n is the individual size; X is an n × q matrix of predictors used to predict y; q is the number of predictors in the model; β is a q ×1 vector of model effects, ε is an n × 1 vector of residual errors with an assumed N (0, Iσ) distribution; Z is a column for the genotype indicator variable of all n individuals for marker k; γ is the additive genetic effect of marker k. The marker k for individual j (where j = 1, 2, …, n) in the study is defined as 1, 0, and -1 for homozygote of the minor allele, heterozygote, and homozygote of the major allele, respectively. The GBLUP method assumes , where ϕ2 represents the polygenic variance shared by all markers. The expectation of y is E(y) = Xβ. The variance-covariance matrix is var(y) = V = Kϕ2+ Iσ2 = (Kλ + I)σ, where λ = ϕ2/σ is the variance ratio and is a marker-generated kinship matrix. The GBLUP method exploits the genomic relationships between training populations and testing populations to predict the genomic values for unknown individuals without estimating marker effects. The GBLUP was implemented using the predhy/R program [11]. The LASSO method assumes and for all k = 1,…,q, where λ is a shrinkage parameter. The method directly estimates marker effects in the training population and predicts the genomic values of individuals in the testing population. When performing LASSO, the marker k for individual j (where j = 1, 2, …, n) was defined as PGSI instead of a single nucleotide marker value. Since the number of SNP markers had little effect on the accuracy of the genomic prediction [11], 0.05% of all SNP markers were randomly selected when SNP markers were used as genetic information in LASSO and GBLUP models. The LASSO was implemented using the glmnet/R program in the present study [42]. Since the LASSO method can achieve variable selection, the variables obtained using the LASSO method were extracted and re-estimated using linear regression, and they were implemented in the R program using the default functions. The Pearson correlation coefficient between the observed and predicted heterosis was used to calculate predictability. We provided the codes in getting the predictability as S1 Data.

Models applied for the prediction of heterosis

GBLUP_A, GBLUP_AD, LASSO_SNP, LASSO_100Kb, LASSO_500Kb, and LASSO_1Mb models were applied to predict the heterosis of six agronomic traits. The two GBLUP models differ from each other in building the models merely based on the consideration of additive effects (GBLUP_A) only or both additive and dominant effects (GBLUP_AD). Conversely, the four LASSO models applied differ based on the units used to calculate PGSIs, which ranged from single nucleotide (LASSO_SNP) to decreased nucleotide sizes, including 100Kb (LASSO_100Kb), 500Kb (LASSO_500Kb), and 1M (LASSO_1Mb) nucleotide fragments. Since the application of the different prediction models resulted in different heterosis predictabilities for the same trait, we adopted the model with the highest predictability for a particular trait to predict the heterosis of a trait. For example, the LASSO_100Kb and LASSO_500Kb models were selected for the prediction of GY-HPH and PH-HPH, respectively.

Predictability drawn from ten-fold cross-validation

Predictability was drawn from 10-fold cross-validation, in which nine parts of a sample were used to estimate parameters used for the prediction of heterosis in the remaining part of the sample. Eventually, each individual was predicted once and used nine times to estimate the parameters. The Pearson correlation coefficient between the observed and predicted heterosis was used to calculate predictability. We replicated the cross-validation analysis 100 times, and the predictability of each trait was the average value of the 100 times prediction.

Verification of prediction model

A testing population containing 100 hybrid lines was developed by crossing the Ogu-CMS-pool members with two independent restorers that were not used to calculate the predictabilities of the training population. The optimal model for NSS-HPH, NSP-HPH, NBP-HPH, and PH-HPH was LASSO_100Kb, and the optimal models for TSW-HPH and GY-HPH were LASSO_500Kb and LASSO_1MB, respectively (Table 1). LASSO_100Kb was adopted to predict NSS-HPH, NSP-HPH, NBP-HPH, and PH-HPH. LASSO_500Kb and LASSO_1MB were adopted to predict TSW-HPH and GY-HPH, respectively. The optimal model for GY-MPH, TSW-MPH, NSP-MPH, and PH-MPH was LASSO_100Kb. The optimal models for NSS-MPH and NBP-MPH were LASSO_500Kb and LASSO_1MB, respectively (Table 2). Therefore, LASSO_100Kb was adopted to predict GY-MPH, TSW-MPH, NSP-MPH, and PH-MPH, LASSO_500Kb, and LASSO_1Mb were adopted to predict NSS-MPH and NBP-MPH. The predicted and actual values observed in the fields were analyzed to determine the predictability, which indicated the validity of the prediction models.

Definition of h-QTLs and high-impact h-QTLs

The regression coefficients of the variables in the model were the criteria for selecting h-QTLs. All the variables were defined as h-QTLs, and the top 10% and bottom 10% regression coefficients of the variables were considered high-impact h-QTLs. Excluding the h-QTLs on randomly piled scaffolds, whose positions on certain chromosomes are unknown, all the other h-QTLs for specific traits were displayed on the maps, and the underlying genes responsible for high-impact h-QTLs were investigated.

The naming of h-QTLs

The name of an h-QTL indicates its position on a chromosome. The position of the chromosome from the top to the bottom corresponds to its position from the beginning to the end. The h-QTLs of each chromosome were named based on the IDs of chromosomes and series numbers. For example, Chr.C01-01 indicates the No. 01 h-QTL, counting from the top to the bottom on Chromosome C01. We indicated positive and negative h-QTL on the maps for specific traits. A positive effect shown with a circle on maps means the smaller the PGSI, the great the heterosis, whereas, a negative effect tagged with a triangle means the bigger the PGSI, the greater the heterosis. The gradation of color (dark or tint) represents the degree of an effect.

Drawing of h-QTL map

The h-QTL map was drawn by using the Rldeogram/R program [43]. The density of genes on chromosomes is plotted from the annotation file (Brassica_napus.annotation_v5.gff3.gz, https://www.genoscope.cns.fr/brassicanapus/data/).

Linkage disequilibrium analysis

We used a previously described method for linkage disequilibrium analysis [15]. Briefly, PLINK software (www.cog-genomics.org/plink2; v1.9) was used to calculated complete and partial LD between each pair of SNPs. The squared correlation coefficient (r2) values and the significance of all detected LD between polymorphic sites (P< 0.05) were analyzed for all chromosomes with a 1000-kb window.

Principal component analysis (PCA) plot of the first two components (PC1 and PC2) of the 1057 accessions.

PC1 accounts for 11.19% of the total variation in the winter-type accessions compared to the other accessions, whereas PC2 accounts for 6.90% of the total variation between the semi-winter type and the spring type accession. Green dots represent spring ecotype, blue dots represent winter ecotype, grey dots represent semi-spring ecotype, and red dots represent sterile lines that were used as female parents in the study. (TIF) Click here for additional data file.

h-QTLs responsible for TSW-HPH.

The circles represent the h-QTLs that positively contributed to the TSW-HPH, and the triangles represent h-QTLs that were negatively correlated to TSW-HPH. The darker the colors of circles and triangles, the greater the effects of the h-QTLs, either positive or negative. The colors on the chromosomes indicate the density of genes. The darker the blue, the lower the gene density, the darker the red, the higher the gene density. A and C stand for the two sub-genomes of Brassica napus. A limited number of h-QTLs on randomly piled contigs, whose positions on certain chromosomes were unknown, are not shown on the map. A positive effect indicated with a circle on maps means the smaller the PGSI, the great the heterosis, whereas, a negative effect tagged with a triangle means the bigger the PGSI, the greater the heterosis. (TIF) Click here for additional data file.

h-QTLs responsible for NSS-HPH.

The circles represent the h-QTLs that positively contributed to the NSS-HPH, and the triangles represent h-QTLs that were negatively correlated to NSS-HPH. The darker the colors of circles and triangles, the greater the effects of the h-QTLs, either positive or negative. The colors on the chromosomes indicate the density of genes. The darker the blue, the lower the gene density, the darker the red, the higher the gene density. A and C stand for the two sub-genomes of Brassica napus. A limited number of h-QTLs on randomly piled contigs, whose positions on certain chromosomes were unknown, are not shown on the map. A positive effect indicated with a circle on maps means the smaller the PGSI, the great the heterosis, whereas, a negative effect tagged with a triangle means the bigger the PGSI, the greater the heterosis. (TIF) Click here for additional data file.

h-QTLs responsible for NSP-HPH.

The circles represent the h-QTLs that positively contributed to the NSP-HPH, and the triangles represent h-QTLs that were negatively correlated to NSP-HPH. The darker the colors of circles and triangles, the greater the effects of the h-QTLs, either positive or negative. The colors on the chromosomes indicate the density of genes. The darker the blue, the lower the gene density, the darker the red, the higher the gene density. A and C stand for the two sub-genomes of Brassica napus. A limited number of h-QTLs on randomly piled contigs, whose positions on certain chromosomes were unknown, are not shown on the map. A positive effect indicated with a circle on maps means the smaller the PGSI, the great the heterosis, whereas, a negative effect tagged with a triangle means the bigger the PGSI, the greater the heterosis. (TIF) Click here for additional data file.

h-QTLs responsible for NBP-HPH.

The circles represent the h-QTLs that positively contributed to the NBP-HPH, and the triangles represent h-QTLs that were negatively correlated to NBP-HPH. The darker the colors of circles and triangles, the greater the effects of the h-QTLs, either positive or negative. The colors on the chromosomes indicate the density of genes. The darker the blue, the lower the gene density, the darker the red, the higher the gene density. A and C stand for the two sub-genomes of Brassica napus. A limited number of h-QTLs on randomly piled contigs, whose positions on certain chromosomes were unknown, are not shown on the map. A positive effect indicated with a circle on maps means the smaller the PGSI, the great the heterosis, whereas, a negative effect tagged with a triangle means the bigger the PGSI, the greater the heterosis. (TIF) Click here for additional data file.

h-QTLs responsible for PH-HPH.

The circles represent the h-QTLs that positively contributed to the PH-HPH, and the triangles represent h-QTLs that were negatively correlated to PH-HPH. The darker the colors of circles and triangles, the greater the effects of the h-QTLs, either positive or negative. The colors on the chromosomes indicate the density of genes. The darker the blue, the lower the gene density, the darker the red, the higher the gene density. A and C stand for the two sub-genomes of Brassica napus. A limited number of h-QTLs on randomly piled contigs, whose positions on certain chromosomes were unknown, are not shown on the map. A positive effect indicated with a circle on maps means the smaller the PGSI, the great the heterosis, whereas, a negative effect tagged with a triangle means the bigger the PGSI, the greater the heterosis. (TIF) Click here for additional data file.

Genome-wide average LD decay in the sterile lines, restore lines, and all lines.

The green, red, and blue curves display the rate of LD decay over distance(Kb) in all sixty parental lines, sterile lines, and restore lines, respectively. (TIF) Click here for additional data file.

Information of the sixty parental lines involved in the study.

(XLSX) Click here for additional data file.

Phenotypic values for all agronomic traits in the three environments.

(XLSX) Click here for additional data file.

The genetic relationship between parents and offspring.

(XLSX) Click here for additional data file.

The cross IDs of the top 10% for each agronomic trait.

(XLSX) Click here for additional data file.

Correlation coefficients between the phenotypic values of the parents and the hybrids.

(XLSX) Click here for additional data file.

Correlation coefficients among the six agronomic traits of the hybrids.

(XLSX) Click here for additional data file.

Correlations between the PGSI and the heteroses.

(XLSX) Click here for additional data file.

Absolute values of the correlation coefficients between the PGSI and the heteroses.

(XLSX) Click here for additional data file.

Positions of the h-QTLs that are associated with the GY-HPH.

(XLSX) Click here for additional data file.

Positions of the h-QTLs that are associated with the TSW-HPH.

(XLSX) Click here for additional data file.

Positions of the h-QTLs that are associated with the NSS-HPH.

(XLSX) Click here for additional data file.

Positions of the h-QTLs that are associated with the NSP-HPH.

(XLSX) Click here for additional data file.

Positions of the h-QTLs that are associated with the NBP-HPH.

(XLSX) Click here for additional data file.

Positions of the h-QTLs that are associated with the PH-HPH.

(XLSX) Click here for additional data file.

Genes underlying the h-QTLs for GY-HPH.

(XLSX) Click here for additional data file.

Genes underlying the h-QTLs for TSW-HPH.

(XLSX) Click here for additional data file.

Genes underlying the h-QTLs for NSS-HPH.

(XLSX) Click here for additional data file.

Genes underlying the h-QTLs for NSP-HPH.

(XLSX) Click here for additional data file.

Genes underlying the h-QTLs for NBP-HPH.

(XLSX) Click here for additional data file.

Genes underlying the h-QTLs for PH-HPH.

(XLSX) Click here for additional data file.

Analyses of variances of predictability from 6 × 2 × 2 × 5 factorial design with six traits, two heteroses (HPH and MPH), two models (GBLUP and LASSO), and five methods (GBLUP_A, GBLUP_AD, LASSO_1Mb, LASSO_500Kb, and LASSO_100Kb).

(XLSX) Click here for additional data file.

Comparison of the predictability for heterosis among testing population performed by LASSO using different windows drawn from tenfold cross-validation.

(XLSX) Click here for additional data file.

Code to build model and obtain the predictability of the model.

(DOCX) Click here for additional data file. 25 Jul 2021 Dear Dr Jiang, Thank you very much for submitting your Research Article entitled 'Prediction of Heterosis in the Recent Rapeseed ( Brassica napus ) Polyploid by Pairing Parental Nucleotide Sequences' to PLOS Genetics. The manuscript was fully evaluated at the editorial level and by three independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the current manuscript. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review a much-revised version. We cannot, of course, promise publication at that time. Should you decide to revise the manuscript for further consideration here, your revisions should address the specific points made by each reviewer. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org. If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool.  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process. To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder. [LINK] We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions. Yours sincerely, Zhixi Tian, Ph.D Associate Editor PLOS Genetics Li-Jia Qu Section Editor: Plant Genetics PLOS Genetics Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: Heterosis is very important in agricultural productions. This work addressed yield heterosis in rapeseed by crossing 50 CMS accessions and 8 restorers. The population was then used for genomic selection. The work may provide useful resources for future heterosis sutdies. Below are suggestions for the work: 1. Actually, the sample size (n=400) is not enough for yield traits. Any discussions? 2. GO analyses should be removed. Such analyses applied in this paper are misleading! In one interval, there are tens of genes, and only one of them are the causal one. Pooling tens of "false" genes and a correct one for GO must lead to wrong conclusions. 3. No independent validation crosses for the modelling. The authors should generate tens of crosses (not from 50 CMS accessions and 8 restorers) and tested the accuracy of their GS model. 4. The genotypic and phenotypic data should be publically available for authors. Reviewer #2: Wang and his/her many colleagues made efforts to predict heteroisis of rapeseed by pairing parent nucleotide sequences. They developed prediction models by introducing the concept of regional parental genetic-similarity index (PGSI) and successfully reduced dimension in the calculation matrix to give more precise prediction. Moreover, they identified heterosis-QTLs and partitioned the impact of heterosis per subgenome and chromosome. They described a useful approach which is validated through the comparison of field observations and in silico predictions. There are some quite interesting points of this paper which was overall clear and well presented. The authors concluded that the diversity of C subgenome was more important for the rapeseed heterosis than that of A subgenome. I doubted this conclusion. To the best of my knowledge, the A subgenome was much more genetic diverse than the C subgenome. I would, therefore, expect that the A subgenome would be more important for the heterosis than the C genome. The explanation from the authors would be very much appreciated. I would also list some specifics for your consideration in the revision. 1. Please mention in the M&M how you calculate high and mid parent heterosis, although I know you defined them in the introduction. 2. Please discuss 10-fold cross-validation. Were there common parents between the training and validation sets? 3. Please list the cross IDs of the top 10% in a table or supplemental table. 4. Please include model used in the legend of Fig 5. 5. Fig 6: which correlations was significant? 6. You did not consider LD. You should better discuss the justification and consequences of not considering. Reviewer #3: This manuscript describes a interesting study on the heterosis of six agronomic traits in rapeseed. The experiments were carried out using several prediction models and the method of PGSI may reduce the computational load for prediction in the crops with large genomes. In addition, CMS lines and the restorer can be very useful resource for community. Overall, this manuscript provides meaningful results of the heterosis prediction in rapeseed with polyploid genomes. There are some comments that the authors may need to address. Lines 75-77, please add the detail for calculation of the HPH and MPH. For different traits, do you always use the higher values as the better parent for HPH? For some traits, the higher value is not always the expected phenotype, for example, higher plant height may result in logging. Lines 96-105, the authors mentioned the advantage of GS comparing to MAS, I would expected to discuss somehow in your results. The h-QTLs here are the genomic regions, so it’s really interesting to study how is the situation if only the h-QTLs were used for prediction. Lines 153-158, please provide the variance explained by the PC1 and PC2, and please use a different color to show the distribution of 50 CMS lines in Figure S1. Lines 171-175, from the Figure 1, PH in most of hybrids are higher than those in the male parents. As expected, the female parents would affect the hybrids more than the male parents, however the correlations between the hybrids and their male parents were higher. Do you have an explanation or more details about this? Lines 179-180, may something be wrong in the figure 2B and table S3 -- NPB or NBP, not consisytent with the manuscript text, please check. Lines 198-202, what’s the hypothesis behind this in studying the influence of HPH regardless of the traits and what is the meaning of the results here? Line 207, how did you get the h-QTLs? it’s not clear and please provide more details in the method. Lines 208-225, you defined positive and negative effects and both of them will result in higher GY-HPH. What’s the different between positive and negative effects? If the positive effect will result in a higher GY in hybrids? Lines 227-235, using PGSI to identify the h-QTLs, it decreases the complexity of computational load. As far as I understand, it also decreases the mapping resolution and results in a large genomic region, which as a result increases the number of genes underline the h-QTLs. Then, what’s the advantage to use a high density of marker? Lines 238-242, it’s not clear and please rewrite these sentences. Line 246, 1860 genes! it’s really a lot. Do you think all of these genes have an important influence on the heterosis of PH? Line 277, the cross-validation only with 10 replications, it’s not enough especially for the traits of NSS. What’s the meaning of A,B,C,D… in Figure 5? Lines 292-294, How did you code the design matrix of additive and dominance effect in your GBLUP_A and GBLUP_AD model? Usually, the dominance effect is related with heterosis, so what’s the explanation of that no difference between the two model? Lines 348-356, it’s a nice discussion about the influence of LD distance. It may be valuable to estimate the LD distance in this population and then to calculate PGSI by considering the LD distance, such as 1(LD distance), 2(LD distance) and 3(LD distance) … Line 385, Figure 1 change to Figure S1? Lines 524-525, it should be in the results part? Lines 540-543, were all of the hybrids planted in the 3 years? Lines 552-553, why not to adjust the effect of replicates in this model directly instead of using the mean value? Lines 626-633, it’s not clear that how you got the predictability. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: None Reviewer #2: None Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No 15 Sep 2021 Submitted filename: Responses to reviewers.docx Click here for additional data file. 15 Oct 2021 Dear Dr Jiang, We are pleased to inform you that your manuscript entitled "Prediction of Heterosis in the Recent Rapeseed ( Brassica napus ) Polyploid by Pairing Parental Nucleotide Sequences" has been editorially accepted for publication in PLOS Genetics. Congratulations! Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional acceptance, but your manuscript will not be scheduled for publication until the required changes have been made. Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org. In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. If you have a press-related query, or would like to know about making your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date. Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics! Yours sincerely, Zhixi Tian, Ph.D Associate Editor PLOS Genetics Li-Jia Qu Section Editor: Plant Genetics PLOS Genetics www.plosgenetics.org Twitter: @PLOSGenetics ---------------------------------------------------- Comments from the reviewers (if applicable): Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: I'm satisfied with the revison, no further requirements. Reviewer #2: The manuscript of this version has been very much improved. The authors performed the calculation again with much more replications (100 instead of 10), and addressed all my concerns. Reviewer #3: The authors have well addressed my questions/comments and the manuscript was improved much better and satistified to me. I have no further comments. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: None Reviewer #2: None Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No ---------------------------------------------------- Data Deposition If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website. The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-21-00874R1 More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support. Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present. ---------------------------------------------------- Press Queries If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org. 29 Oct 2021 PGENETICS-D-21-00874R1 Prediction of Heterosis in the Recent Rapeseed ( Brassica napus ) Polyploid by Pairing Parental Nucleotide Sequences Dear Dr Jiang, We are pleased to inform you that your manuscript entitled "Prediction of Heterosis in the Recent Rapeseed ( Brassica napus ) Polyploid by Pairing Parental Nucleotide Sequences" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work! With kind regards, Zsofia Freund PLOS Genetics On behalf of: The PLOS Genetics Team Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom plosgenetics@plos.org | +44 (0) 1223-442823 plosgenetics.org | Twitter: @PLOSGenetics
  32 in total

1.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

2.  Comparison of genomic predictions using medium-density (∼54,000) and high-density (∼777,000) single nucleotide polymorphism marker panels in Nordic Holstein and Red Dairy Cattle populations.

Authors:  G Su; R F Brøndum; P Ma; B Guldbrandtsen; G P Aamand; M S Lund
Journal:  J Dairy Sci       Date:  2012-08       Impact factor: 4.034

3.  Predicting hybrid performance in rice using genomic best linear unbiased prediction.

Authors:  Shizhong Xu; Dan Zhu; Qifa Zhang
Journal:  Proc Natl Acad Sci U S A       Date:  2014-08-11       Impact factor: 11.205

4.  Genomic Prediction of Pumpkin Hybrid Performance.

Authors:  Po-Ya Wu; Chih-Wei Tung; Chieh-Ying Lee; Chen-Tuo Liao
Journal:  Plant Genome       Date:  2019-06       Impact factor: 4.089

5.  Genomic Prediction of Barley Hybrid Performance.

Authors:  Norman Philipp; Guozheng Liu; Yusheng Zhao; Sang He; Monika Spiller; Gunther Stiewe; Klaus Pillen; Jochen C Reif; Zuo Li
Journal:  Plant Genome       Date:  2016-07       Impact factor: 4.089

6.  Dominance is the major genetic basis of heterosis in rice as revealed by QTL analysis using molecular markers.

Authors:  J Xiao; J Li; L Yuan; S D Tanksley
Journal:  Genetics       Date:  1995-06       Impact factor: 4.562

7.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

8.  Identification and mapping of molecular markers linked to the tuberculate fruit gene in the cucumber (Cucumis sativus L.).

Authors:  Weiwei Zhang; Huanle He; Yuan Guan; Hui Du; Lihua Yuan; Zheng Li; Danqing Yao; Junsong Pan; Run Cai
Journal:  Theor Appl Genet       Date:  2009-10-22       Impact factor: 5.699

9.  Implementation of Genomic Prediction in Lolium perenne (L.) Breeding Populations.

Authors:  Nastasiya F Grinberg; Alan Lovatt; Matt Hegarty; Andi Lovatt; Kirsten P Skøt; Rhys Kelly; Tina Blackmore; Danny Thorogood; Ross D King; Ian Armstead; Wayne Powell; Leif Skøt
Journal:  Front Plant Sci       Date:  2016-02-12       Impact factor: 5.753

View more
  1 in total

1.  Photosynthetic Efficiency and Glyco-Metabolism Changes in Artificial Triploid Loquats Contribute to Heterosis Manifestation.

Authors:  Lingli Wang; Meiyan Tu; Jing Li; Shuxia Sun; Haiyan Song; Zihong Xu; Dong Chen; Guolu Liang
Journal:  Int J Mol Sci       Date:  2022-09-26       Impact factor: 6.208

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.