Literature DB >> 29381708

RNA-seq transcriptome analysis of the immature seeds of two Brassica napus lines with extremely different thousand-seed weight to identify the candidate genes related to seed weight.

Xinxin Geng1,2,3, Na Dong1, Yuquan Wang1, Gan Li1, Lijun Wang2, Xuejiao Guo1, Jiabing Li1, Zhaopu Wen1, Wenhui Wei1,2.   

Abstract

Brassica napus is an important oilseed crop worldwide. Although seed weight is the main determinant of seed yield, few studies have focused on the molecular mechanisms that regulate seed weight in B. napus. In this study, the immature seeds of G-42 and 7-9, two B. napus doubled haploid (DH) lines with extremely different thousand-seed weight (TSW), were selected for a transcriptome analysis to determine the regulatory mechanisms underlying seed weight at the whole gene expression level and to identify candidate genes related to seed weight. A total of 2,251 new genes and 2,205 differentially expressed genes (DEGs) were obtained via RNA-seq (RNA sequencing). Among these genes, 1,747 (77.61%) new genes and 2020 (91.61%) DEGs were successfully annotated. Of these DEGs, 1,118 were up-regulated and 1,087 were down-regulated in the large-seed line. The Kyoto Encyclopedia of Genes and Genomes (KEGG) database analysis indicated that 15 DEGs were involved in ubiquitin-mediated proteolysis and proteasome pathways, which might participate in regulating seed weight. The Gene Ontology (GO) database indicated that 222 DEGs were associated with the biological process or molecular function categories related to seed weight, such as cell division, cell size and cell cycle regulation, seed development, nutrient reservoir activity, and proteasome-mediated ubiquitin-dependent protein catabolic processes. Moreover, 50 DEGs encoding key enzymes or proteins were identified that likely participate in regulating seed weight. A DEG (GSBRNA2T00037121001) identified by the transcriptome analysis was also previously identified in a quantitative trait locus (QTL) region for seed weight via SLAF-seq (Specific Locus Amplified Fragment sequencing). Finally, the expression of 10 DEGs with putative roles in seed weight and the expression of the DEG GSBRNA2T00037121001 were confirmed by a quantitative real-time reverse transcription PCR (qRT-PCR) analysis, and the results were consistent with the RNA sequencing data. This work has provided new insights on the molecular mechanisms underlying seed weight-related biosynthesis and has laid a solid foundation for further improvements to the seed yield of oil crops.

Entities:  

Mesh:

Year:  2018        PMID: 29381708      PMCID: PMC5790231          DOI: 10.1371/journal.pone.0191297

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Brassica napus is an important oilseed crop that is used as an edible oil and as an animal feed worldwide. In recent years, this crop has been popularly used as an ornamental plant and renewable energy source in China [1]. In addition to silique number per plant and seed number per silique, seed weight is one of the three major components of yield for oilseed rape [2]. Increasing the seed weight is the primary approach to improve the yield of B. napus [2]. Only on chromosome A09, 17 QTL were identified for seed weight [3], which showed the genetic complexity of seed weight regulation. Therefore, understanding the regulatory mechanisms underlying gene expression and selecting the appropriate candidate genes for seed weight are of great significance for yield improvement in oilseed rape breeding. However, few reports have focused on the molecular mechanisms that regulate the differences of seed weight in B. napus, which may be related to complicated quantitative characteristics or environment [4]. Recently, the genome sequence of B. napus has been reported, and it can be utilized to study the regulatory mechanisms underlying seed weight [2]. Several seed weight (size) genes have been successfully cloned, such as those in rice and Arabidopsis thaliana. In B. napus, Liu et al. [5] reported that variations in the auxin-response factor 18 (ARF18) genes on chromosome A9 could regulate the seed weight without changing the pod, and it represents the first polyploidy crop yield gene acquired by map-based cloning. The molecular mechanism underlying yield genes has frequently been studied in rice, and the results provide rich information for our study on the seed weight of B. napus. Many genes related to grain weight and size have been reported in rice, including GW2, GW5, GL3, and GS5. GW2 and GW5 have similar functions in regulating grain weight. Song et al. [6] revealed that GW2 encodes a ring-type E3 ubiquitin ligase. Loss of GW2 function regulates cell division and increases the cell number, enlarges the spikelet hull, accelerates the grain milk-filling rate, and results in enhanced grain width, weight and yield. Weng et al. [7] found that GW5 most likely regulates cell division during grain development through the ubiquitin-proteasome pathway. There are three key enzymes required for the ubiquitin-proteasome pathway: E1, E2 and E3 [8]. Ubiquitin is initially activated by E1, and then E2 and E3 work in conjunction to recognize the substrate protein and conjugate it to ubiquitin. Ubiquitin is then easily attached as a previously synthesized chain or a monomer. Finally, the ubiquitinated protein is transported to the proteasome for degradation [9]. Qi et al. [10] noted that GL3.1 encodes Ser/Thr phosphatase, a member of the protein phosphatase kelch (PPKL) family, and found that GL3.1 accelerates cell division by influencing protein phosphorylation in the spikelet, thereby resulting in longer grains and higher yields in rice. Further studies showed that GL3.1 directly dephosphorylates its substrate Cyclin-T1;3. The down-regulation of Cyclin-T1;3 in rice results in a shorter grain, which occurs through a novel phosphatase-mediated process during cell cycle progression. GS5 encodes a putative serine carboxypeptidase that acts as a positive regulator of cell division. The high expression of GS5 promotes and accelerates the cell cycle, and it then accelerates cell division of the outer glume and ultimately enlarges the grain size and increases the grain weight [11]. In A. thaliana, the cloned genes primarily regulated cell division, although they also controlled the cell number or cell size of the integument, embryo, and endosperm and affected the final seed weight or seed size. Du et al. [12] reported that SUPPRESSOR2 OF DA1 (SOD2) encodes ubiquitin-specific protease15 (UBP15), which regulates seed size by promoting cell division in the integuments of ovules and developing seeds. DA1 and DA2 encode a ubiquitin receptor and a RING-type protein with E3 ubiquitin ligase activity, respectively, which determine the final seed weight and organ size by restricting the period of cell proliferation [13,14]. The ARF2 gene links auxin signalling, cell division, and the final size of seeds and other organs [15]. Jofuku et al. [16] revealed that APETALA2 (AP2) plays an important role in determining the seed size, seed weight and seed oil and protein accumulation, and it acts via the maternal sporophyte and endosperm genomes to control seed weight and seed yield. Previous research on seed weight has provided helpful information for our work identifying candidate genes for seed weight via transcriptome analyses. Next-generation sequencing (NGS) technology has been widely used in biological research. Because the genomes of a large number of valuable species have been published, NGS technology has rapidly developed [17]. The transcriptome is the set of all messenger RNA and non-coding RNA molecules in one cell or a population of cells for a specific development stage or physiological condition. Transcriptome analysis is an important method of studying and exploring functional genes in plants. Compared with genome analyses, transcriptome analyses only assess transcribed genes; thus, they have a smaller research scope, which may produce more pertinent results [18]. In recent years, RNA-seq technology has rapidly developed into the most important method for transcriptome profiling [19]. Plant transcriptome analyses could provide considerable information on highly expressed genes, differentially expressed genes and new genes [20-22]. Transcriptome sequencing has been widely applied in studies on Brassica, such as transcriptome profile analysis of young floral buds of fertile and sterile plants [23], transcriptome profiling of resistance to pathogen [24] and research to narrow down the number of candidate genes in identified quantitative trait locus (QTL) regions [25]. In this study, two transcriptomes were compared for the immature seeds of two B. napus lines with extremely different thousand-seed weight (TSW). A total of 2,251 new genes and 2,205 (differentially expressed genes) DEGs were obtained. A bioinformatics analysis was performed to investigate those differentially expressed genes related to seed weight. A DEG (GSBRNA2T00037121001) involved in the identified QTL region for seed weight is an important candidate gene for further seed weight research. These databases will provide an important resource and helpful insights for identifying candidate genes related to seed weight and improving the final yield of B. napus.

Results and discussion

Comparison of the seed weight, seed size and fresh seed weight at different seed stages between two extreme lines

The results showed that the seed size and seed weight of the large-seed line DH-G-42 were significantly greater than those of the small-seed line DH-7-9 (Fig 1, Table 1). The statistical data for the fresh seed weight revealed that there were few differences between the large-seed line and the small-seed line during 1 and 2 weeks after flowering (WAF). However, from 3 to 6 weeks after flowering, the fresh weight of DH-G-42 was obviously higher than that of DH-7-9 (Table 1). Thus, the 3rd week might be the key period in which the two lines differentiate. In B. napus, from 20 days after flowering to maturity was the fastest growing period for seed development, which supported our results.
Fig 1

The seed phenotype of two extreme B. napus lines (large-seed line DH-G-42 and small-seed line DH-7-9).

Table 1

Statistics of the seed weight, seed size and fresh seed weight at different periods of development.

Sample IDThousand seed weight (g)Seed length (mm)Fresh seed weight (mg)
1 WAF2 WAF3 WAF4 WAF5 WAF6 WAF
DH-G-426.2±0.02 a*2.5±0.10 a*1.58±0.01 e1.46±0.04 e3.68±0.10 d7.03±1.04 c12.98±1.39 b14.62±0.42 a*
DH-7-92.45±0.08 b1.6±0.26 b1.07±0.07 e1.15±0.11 e1.17±0.06 e2.64±0.06 de3.8±0.26 d4.22±0.17 d

* Means followed by the same lowercase letters are not significantly different at P ≤ 0.05 by Tukey’s Honest Significant Differences test.

* Means followed by the same lowercase letters are not significantly different at P ≤ 0.05 by Tukey’s Honest Significant Differences test.

Transcriptome sequencing

After the construction and sequencing of two cDNA libraries, a total of 9.06 G and 7.16 G nucleotides were generated from T3 (large-seed) and T4 (small-seed) libraries (named by the Sequencing Co.), respectively (Table 2). After removing the adaptor sequences, low quality and short reads, 7.16 G and 6.88 G data were obtained for large seed line and small seed line, respectively. The N content was 0% in both libraries. The Q30 percentage exceeded 90%, and the GC (guanine-cytosine) content was 46.57% and 47.38% for the large-seed and small-seed lines, respectively, which suggests that the sequencing data were highly accurate and reliable. The total reads of large (70,844,224) and small seed line (67,971,356) were blasted with the reference genome of B. napus. On average, 82.04% and 82.75% of the reads were successfully mapped to the reference genome by Tophat (v2.0.12) (Table 3).
Table 2

Comparison of the raw transcriptome data for the large-seed line and small-seed line in B. Napus.

SampleTotal nucleotides (G)Used nucleotides (G)GC (%)N (%)a Q20 (%)Cycle Q20 (%)b Q30 (%)
T39.067.1646.57098.7910093.91
T48.686.8847.38098.7910093.87

a Q20 indicates a quality score of 20, a 1% chance of error and 99% confidence

b Q30 indicates a quality score of 30, a 0.1% chance of error and 99.9% confidence

Table 3

Statistics of RNA-seq data aligned to the reference genome.

SampleTotal readsTotal mapping readsUniquely mapping readsMultiple mapping readsPair-mapped readsSingle mapped reads
T370,844,22458,121,707 (82.04%)52,338,904 (90.05%)5,782,803 (9.95%)40,002,696 (68.83%)6,512,313 (11.2%)
T467,971,35656,247,370 (82.75%)50,709,768 (90.15%)5,537,602 (9.85%)38,609,476 (68.64%)5,933,668 (10.55%)
a Q20 indicates a quality score of 20, a 1% chance of error and 99% confidence b Q30 indicates a quality score of 30, a 0.1% chance of error and 99.9% confidence

Analysis of new genes and differentially expressed genes (DEGs)

After filtering the genes that only contained one exon or encoded short peptide chains (less than 50 amino acid residues), a total of 2,251 new genes and 97,963 genes were revealed by blasting the reference genome using Cufflinks. The sequences and detailed information are listed in S1 and S2 Tables. To identify the DEGs between the large-seed and small-seed samples, the Fragments Per Kilobase of exon per Million fragments mapped (FPKM) was used to calculate the gene expression levels. The results showed that a total of 2,205 genes were differentially expressed between two libraries, of which, 1,118 genes were up-regulated and 1,087 genes were down-regulated in T3 compared with T4. Detailed information is listed in S3 Table. The up-regulated and down-regulated genes between T3 and T4 were revealed by a hierarchical clustering analysis. The results of the hierarchical clustering analysis based on the FPKM values (Fig 2). It was found that about 29% DEGs with the log2FPKM value was lower than 2, 63% DEGs with the log2FPKM value was between 2 and 6, 8% DEGs had the log2FPKM value between 6 and 10, and one DEG (GSBRNA2T00002721001) owned the log2FPKM value up to 11.89. In Fig 2, all gene expression levels seemed to be low, that is because we set the reference level of DEGs’ log2FPKM value (10) was much higher than normal (6) between T3 and T4. Furthermore, there were too many genes expressed during the seed development, a higher reference level facilitated to screen the differentially expressed genes between T3 and T4.
Fig 2

Hierarchical clustering analysis of DEGs based on FPKM data between T3 and T4.

The colour key represents the FPKM (Fragments Per Kilobase of exon per Million fragments mapped) normalized log2 transformed counts. Red represents high expression, and blue represents low expression. Each row represents a gene.

Hierarchical clustering analysis of DEGs based on FPKM data between T3 and T4.

The colour key represents the FPKM (Fragments Per Kilobase of exon per Million fragments mapped) normalized log2 transformed counts. Red represents high expression, and blue represents low expression. Each row represents a gene.

Functional annotations

To functionally annotate the B. napus seed transcriptome, the 2,251 new genes and 2,205 DEGs were applied in a blast search of the GO (Gene Ontology), COG (Cluster of Orthologous Group), KEGG (Kyoto Encyclopedia of Genes and Genomes), Swiss-Prot and NR (non-redundant) databases using BLAST (Basic Local Alignment Search Tool) software, which successfully annotated 1,747 new genes and 2,020 DEGs. All of the annotated information is listed in Table 4 and S4 Table. Of these new genes, 1,733, 1,069, 1,321, 336 and 276 were annotated in the GO, COG, KEGG, Swiss-Prot and Nr databases, respectively. For the GO classification analysis of DEGs, 1,763 genes were assigned to three main GO functional categories and then divided into 57 sub-categories (Fig 3). Several DEGs were assigned to more than one sub-category. We then calculated the percentage of DEGs involved in each sub-category (S5 Table). The sub-category with largest percentage in the “cellular component” category was cell part (DEGs accounted for 89.17% of all genes involved in this category), followed by cell (88.94%) and organelle (80.77%). The sub-category with the largest percentage in the “molecular function” category was binding (52.98%) and then catalytic activity (44.36%). The sub-category with the largest percentage in the “biological process” category was cellular process (76.52%), followed by metabolic process (71.47%), response to stimulus (55.42%) and biological regulation (47.87%). These results indicated that the primary cause resulting in different seed weight might be related to the differential expression of genes in these nine sub-categories, which would provide a direction for further analysis.
Table 4

Functional annotation of new genes and differentially expressed genes in the COG, GO, KEGG, Swiss-Prot and NR databases.

Annotation DatabaseNumber of Annotated New Genes300< = Length<1000Length> = 1000Number of Annotated DEGs
COG_Annotation276 (12.26%)146121678 (30.75%)
GO_Annotation1,321 (58.69%)6816141,763 (79.95%)
KEGG_Annotation336 (14.93%)191135493 (22.38%)
Swissprot_Annotation1,069 (47.49%)5205261,451 (65.8%)
NR_Annotation1,733 (76.99%)8828132,015 (91.38%)
All_Annotated1,747 (77.61%)8948152,020 (91.61)
Fig 3

Gene ontology (GO) function classification of differentially expressed genes (DEGs) in B. napus.

A total of 1,763 genes were assigned to the three main GO functional categories and then divided into 57 sub-categories.

Gene ontology (GO) function classification of differentially expressed genes (DEGs) in B. napus.

A total of 1,763 genes were assigned to the three main GO functional categories and then divided into 57 sub-categories. The COG functional classification analysis showed that the DEGs were distributed across 25 COG categories (Fig 4). We also calculated the percentage of DEGs in each category (S6 Table) and found that the largest percentage was “General function prediction only” (20.69%), followed by “Transcription” (12.24%), “Posttranslational modification, protein turnover, chaperones” (12.09%) and “Replication, recombination and repair” (10.47%). Many genes were differentially expressed in these categories, which might explain the different seed weights in the large-seed and small-seed lines.
Fig 4

Clusters of orthologous groups (COG) function classification of differentially expressed genes (DEGs) in B. napus.

A total of 678 DEGs were distributed across 25 COG categories.

Clusters of orthologous groups (COG) function classification of differentially expressed genes (DEGs) in B. napus.

A total of 678 DEGs were distributed across 25 COG categories. Information on the metabolic pathways of the transcriptome is valuable for understanding the physiological processes of seed development after flowering. To evaluate the differences between the two lines, the metabolic pathways of the DEGs were analyzed by classification. An analysis using the KEGG database on biological pathways showed that a total of 493 genes were assigned to 97 pathways (Table 4; S7 Table). However, the number of genes involved in these 97 pathways was 505 instead of 493, which suggests that certain genes might be involved in more than one KEGG pathway, such as the gene “Brassica_newGene_5407”, which is involved in cysteine and methionine metabolism; glycine, serine and threonine metabolism; and lysine biosynthesis. Among the 97 pathways, the largest pathway was ribosome, which contained 31 genes, followed by spliceosome (19), protein processing in endoplasmic reticulum (18), oxidative phosphorylation (15) and plant hormone signal transduction (14). Approximately 91% (2,015) and 66% of the DEGs (1,451) were successfully annotated using the NCBI's NR database and the Swiss-Prot database, respectively (Table 4).

Identification of seed weight-related genes in the B. napus transcriptome and QTL region

Information on the seed weight (seed size) genes in rice, A. thaliana and B. napus were used to study the pathways, biological processes, key enzymes and proteins related to cell division, cell cycle, ubiquitin-proteasome, auxin response factor, Ser/Thr phosphatase, cyclin, and serine carboxypeptidase. The KEGG pathway analysis identified 6 and 9 DEGs for the ubiquitin-mediated proteolysis (ko04120) (Fig 5A) and proteasome (ko03050) (Fig 5B) pathways, respectively (Table 5). Analysis of the related enzymes and proteins of seed weight in the Swiss-Prot and NR databases (Table 6) revealed that 35 DEGs (24 up-regulated and 11 down-regulated) were annotated as encoding the key enzymes and proteins of ubiquitin-protease biological processes. Two genes encoded cell cycle-related enzymes and protein cyclin-dependent kinases and cyclin, and one up-regulated gene encoded the auxin response factor. Four genes (1 up-regulated and 3 down-regulated) encoded serine carboxypeptidase, and 6 genes (4 up-regulated and 2 down-regulated) encoded serine/threonine-protein phosphatase and its regulatory subunit or isozyme. The putative pathways of the above DEGs involved in seed weight formation were shown in Fig 6.
Fig 5

KEGG pathway analysis identified the ubiquitin-mediated proteolysis pathway (ko04120) and proteasome pathway (ko03050) for 15 DEGs between T3 and T4.

(A) ko04120, and (B) ko03050.

Table 5

Summary of DEGs related to seed weight (ubiquitin-mediated proteolysis pathway and proteasome pathway) in the KEGG database between T3 and T4.

Pathway typePathway IDDEG numberUp/Down (numbers of genes)Gene ID
Ubiquitin-mediated proteolysisko0412064 UpGSBRNA2T00022870001
GSBRNA2T00031738001
GSBRNA2T00114729001
GSBRNA2T00151756001
2 DownGSBRNA2T00018127001
GSBRNA2T00025567001
Proteasomeko0305095 UpBrassica_newGene_1807
GSBRNA2T00063942001
GSBRNA2T00074567001
GSBRNA2T00116823001
GSBRNA2T00134599001
4 DownGSBRNA2T00150716001
GSBRNA2T00048424001
Brassica_newGene_2120
GSBRNA2T00108467001
Table 6

Summary of key enzymes and proteins related to seed weight in the Swiss-Prot and NR databases.

Enzymes/proteinDEG numberUp/Down (numbers of genes)Gene ID
Cyclin-dependent kinase inhibitor 311 DownGSBRNA2T00109531001
Cyclin-B1-211 DownGSBRNA2T00051807001
E3 ubiquitin ligase22 UpGSBRNA2T00140145001; GSBRNA2T00097655001
E3 ubiquitin-protein ligase1914 Up, 5 DownGSBRNA2T00049900001; GSBRNA2T00073497001;GSBRNA2T00080006001; GSBRNA2T00076273001;GSBRNA2T00052859001; GSBRNA2T00062381001;GSBRNA2T00125575001; GSBRNA2T00136480001;GSBRNA2T00017196001; GSBRNA2T00018260001;Brassica_newGene_5236; GSBRNA2T00063271001;GSBRNA2T00018008001; GSBRNA2T00032349001;GSBRNA2T00011257001; GSBRNA2T001195860011;Brassica_newGene_1243; GSBRNA2T00151834001;GSBRNA2T00097032001
Ubiquitin-protein ligase 422 UpGSBRNA2T00119586001; GSBRNA2T00017196001
Ubiquitin family protein11 DownGSBRNA2T00123522001
26S proteasome non-ATPase regulatory subunit52 Up, 3 DownGSBRNA2T00150716001; GSBRNA2T00108467001;Brassica_newGene_1807; Brassica_newGene_2120;GSBRNA2T00134599001
26S protease regulatory subunit64 Up, 2 DownGSBRNA2T00109634001; GSBRNA2T00074567001;GSBRNA2T00116823001; GSBRNA2T00018348001; GSBRNA2T00151849001; Brassica_newGene_5455
Auxin response factor 411 UpBrassica_newGene_719
Serine carboxypeptidase33 DownGSBRNA2T00105277001; GSBRNA2T00158449001;GSBRNA2T00136613001
Serine carboxypeptidase S28 family protein11 UpGSBRNA2T00001408001
Serine/threonine-protein phosphatase PP1 isozyme21 Up, 1 DownBrassica_newGene_1210; GSBRNA2T00046355001
Serine/threonine protein phosphatase regulatory subunit32 Up, 1 DownGSBRNA2T00026535001; GSBRNA2T00109497001;GSBRNA2T00080035001
Serine/threonine-protein phosphatase 7 GN11 UpBrassica_newGene_4770
Auxin-responsive protein IAA1211 UpGSBRNA2T00025700001
IAA-amino acid hydrolase 211 DownGSBRNA2T00060750001
Fig 6

The putative pathways of the differentially expressed genes (DEGs) involved in seed weight formation.

The values in the bracket indicate the number of DEGs. Ub, Ubiquitin; E1, Ubiquitin activating enzyme; E2, Ubiquitin conjugating enzyme; E3, Ubiquitin ligase.

KEGG pathway analysis identified the ubiquitin-mediated proteolysis pathway (ko04120) and proteasome pathway (ko03050) for 15 DEGs between T3 and T4.

(A) ko04120, and (B) ko03050.

The putative pathways of the differentially expressed genes (DEGs) involved in seed weight formation.

The values in the bracket indicate the number of DEGs. Ub, Ubiquitin; E1, Ubiquitin activating enzyme; E2, Ubiquitin conjugating enzyme; E3, Ubiquitin ligase. On chromosome A09 of B. napus, a hot QTL region of ~0.58 Mb between nucleotides 25,401,885 and 25,985,931 was identified to contain 91 candidate genes associated with seed weight according to SLAF-seq in our previous studies [2]. The information for these 91 candidate genes is presented in S8 Table. A comparison of the 91 candidate genes obtained by SLAF-seq and the 2,205 DEGs in the B. napus seed transcriptome identified one gene (GSBRNA2T00037121001) that is involved in inorganic ion transport and metabolism and encodes a ferritin according to the annotated information. Identification of the DEGs in QTL regions can substantially narrow down the number of candidate genes. The annotation of all these genes in further analyses will provide important information.

Quantitative real-time reverse transcription PCR (qRT-PCR) analysis of DEGs

To verify the expression level of DEGs detected by RNA-seq, 10 DEGs that were predicted to encode the key enzymes or proteins related to seed weight and the DEG GSBRNA2T00037121001 in the QTL region of TSW were selected for qRT-PCR validation. Among these 10 randomly selected DEGs, 5 genes were up-regulated (higher expression levels in T3 than T4), whereas the remaining 5 genes were down-regulated (lower expression levels in T3 than T4) (Fig 7). The DEG in the QTL region was still up-regulated as shown by the transcriptome analysis. The data obtained by the qRT-PCR was consistent with the RNA-seq results, which suggests the reliability of the transcriptome database.
Fig 7

Relative expression of 11 differentially expressed genes (DEGs) related to seed weight between two extreme B. napus lines via qRT-PCR.

Ten DEGs encoding key enzymes or proteins related to seed weight and one DEG in the QTL region were analyzed via qRT-PCR to validate the transcriptome.

Relative expression of 11 differentially expressed genes (DEGs) related to seed weight between two extreme B. napus lines via qRT-PCR.

Ten DEGs encoding key enzymes or proteins related to seed weight and one DEG in the QTL region were analyzed via qRT-PCR to validate the transcriptome.

Conclusions

In this study, RNA-seq technology was used to reveal the immature seed transcriptome of two extremely different lines (large-seed line and small-seed line) in B. napus. A total of 2,251 new genes and 2,205 DEGs were identified. According to the analysis of annotated information, the identified candidate genes related to seed weight mainly encoded the key enzymes or proteins. A DEG (GSBRNA2T00037121001) that was previously identified in the QTL region for seed weight might be an important candidate gene for further seed weight research in B. napus. Several important DEGs were analyzed by qRT-PCR to verify the RNA-seq results. This work has provided new insights into the molecular mechanisms underlying seed weight-related biosynthesis and has laid a solid foundation for further improvements to the seed yield of oil crops. Further annotation of those new genes and DEGs will also provide rich information and a reference transcriptome for future genome-wide gene expression research in B. napus.

Materials and methods

Plant materials

In 2005, a large-seeded and a small-seeded B. napus germplasms were collected. And a few progeny lines from cross hybridization between these two germplasms were identified with heavier thousand-seed weight than large-seed parent, and some progeny lines had lighter TSW than small-seed parent. Then two progeny lines with extremely different TSW were selected to perform microspore culture to obtain their DH lines, named DH-G-42 and DH-7-9. DH-G-42 and DH-7-9 were planted and grown under non-stressed conditions from October 2014 to May 2015 in a field at the experimental farm of the Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Wuhan, China. The flowering time of DH-G-42 and DH-7-9 began on 12 March 2015 and 2 March 2015, respectively. Immature seeds at each time point from 1 to 6 weeks after flowering were obtained from the lines DH-G-42 and DH-7-9, frozen in liquid nitrogen immediately, and stored at—80°C.

Phenotypic observations

The seeds of three plants from the two lines were respectively harvested to measure the seed weight and seed size at the yielding time. The samples were oven dried at 80°C to constant weight. In this experiment, the thousand seed weight of each line was evaluated as the means of three replicated weights of 1000 seeds. The seed size was measured from the mean values of the diameter of three seeds of each line, with three replications. The fresh weight of immature seeds at each time point from 1 to 6 weeks after flowering were determined by the mean values of 100 seeds, with three replicates of each line.

RNA extraction, cDNA library construction and sequencing

Total RNA was isolated from the seeds at six different stages from 1 to 6 weeks after flowering using TRIzol reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s instructions. RNase-free DNase I was used to remove the residual DNA for 30 min at 37°C. The extracted RNA was qualified and quantified using a NanoDrop 2000 UV–Vis spectrophotometer (NanoDrop, Wilmington, DE, USA) and the samples showed a 260/280 nm ratio between 1.8 and 2.2 and an OD260/230 > 1.0. For each line, equal RNA samples at each stage, including three replicates, were pooled for the RNA-seq. The mRNA-seq libraries were prepared using the Illumina TruSeq RNA Sample Preparation Kit (Illumina Inc., San Diego, CA, USA), and the mRNA isolation, fragment interruption and RNA-seq were performed by the Biomarker Technologies Corporation according to their standard protocol. Ultimately, the libraries were constructed for sequencing using the Illumina HiSeqTM 2500 sequencing platform (Illumina Inc., San Diego, CA, USA) at Biomarker Technologies Corporation in Beijing. The adaptor sequences, empty reads and low quality sequences were filtered from the raw reads. Real-time monitoring was performed for each cycle during sequencing and the ratio of high-quality reads with quality scores greater than Q30 (indicating a quality score of 30 and a 0.1% chance of an error; 99.9% confidence) for the raw reads and guanine-cytosine (GC) content was calculated for quality control.

Identification of new genes and differentially expressed genes (DEGs)

Tophat (v2.0.12) is a fast mapping tool for RNA-seq reads that can identify splice junctions between exons [26]. Cufflinks can be used to assemble the reads into transcripts based on the mapping results [27]. The software programs Tophat and Cufflinks were used to blast the sequencing reads with the reference genome of B. napus (1.2 Gb, download link: http://www.genoscope.cns.fr/brassicanapus/data/ [28]) for the differential gene and transcript expression analyses. This process could reveal new genes that have not been previously annotated using reference genomes and could also evaluate the abundance of gene expression. The Fragments Per Kilobase of exon per Million fragments mapped (FPKM) method was used to calculate the abundance of gene expression. DESeq [29] is suitable for analysing biological duplicate samples obtained from DEG screening, and EBSeq [30] is suitable for non-biological duplicate samples. During the DEG screening, a false discovery rate (FDR) <0.001 and fold change> = 8 were considered standard values. Here, the FDR was obtained via a P-value correction and considered the key index of the differential gene expression analysis. Fold-change formula: First, the fold change and FDR were calculated from the normalized expression. If the DEG fold change is > = 8, then a FDR <0.001 indicates that the DEG was significantly different between T3 and T4.

Functional annotation

Gene ontology (GO) is the de facto standard for gene functionality descriptions, and it is widely used in functional annotations and enrichment analyses [31]. The Cluster of Orthologous Groups of proteins (COG) database is based on the phylogenetic relationships among bacteria, algae and eukaryotes. Gene products in an orthologous relationship can be classified using the COG database [32]. In an organism, different genes coordinate to exert biological functions. Pathway analyses are helpful for further interpreting gene functions. The Kyoto Encyclopedia of Genes and Genomes (KEGG) database is the main public database on pathways [33]. In this study, a functional enrichment analysis of new genes and differentially expressed genes was conducted by performing a blast search against the GO, COG, KEGG pathway, Swiss-Prot [34] and Non-redundant protein (NR) [35] databases using BLAST software [36].

Quantitative real-time reverse transcription PCR (qRT-PCR) analysis

The transcript levels of ten candidate DEGs regulating seed weight were also verified by qRT-PCR. Total RNA [1 μg, pre-treated with DNase I (Promega, Madison, USA)] was reverse transcribed using a reverse transcriptase (Promega, Madison, USA). A 5 μl aliquot of 1:20 diluted cDNA was used as the template in a 20 μl PCR system. The qRT-PCR was performed using SYBR green reaction mix (SYBR Green qRT-PCR Master Mix; Toyobo) after a pre-incubation at 95°C for 5 min, followed by 40 cycles of denaturation at 95°C for 15 s, annealing at 60°C for 15 s, and extension at 72°C for 32 s in an ABI PRISM7500 Sequence Detection System (Applied Biosystems). The 18S rRNA was used as the internal standard because it is uniformly expressed in B. napus tissue. All reactions were performed using one biological sample with three technical replicates. The comparative Ct method was used for the data analysis. Primer pairs were designed using Primer 5 software, and the primer sequences are available in S9 Table.

Statistical analysis

The data were analyzed using a statistical package, SPSS version 16.0 (SPSS, Chicago, IL, USA). The variation between DH-G-42 and DH-7-9 in thousand mature seed weight, mature seed length, and fresh seed weight during seed development (1 to 6 weeks) was evaluated by analysis of variance (ANOVA) followed by Tukey’s Honest Significant Differences test. All statistical analysis data had three repeats. Results were considered significant at P ≤ 0.05.

The list of all genes by RNA-seq.

(XLSX) Click here for additional data file.

The list of new genes by RNA-seq.

(XLSX) Click here for additional data file.

The list of differentially expressed genes (DEGs) by RNA-seq.

(XLSX) Click here for additional data file.

The annotated information of DEGs.

(XLSX) Click here for additional data file.

The GO classification of DEGs.

(XLSX) Click here for additional data file.

The COG classification of DEGs.

(XLSX) Click here for additional data file.

KEGG pathway annotation of DEGs.

(XLSX) Click here for additional data file.

Annotated information for 91 candidate genes.

(XLSX) Click here for additional data file.

Primers for quantitative real-time PCR.

(XLSX) Click here for additional data file.
  33 in total

1.  Isolation and initial characterization of GW5, a major QTL associated with rice grain width and weight.

Authors:  Jianfeng Weng; Suhai Gu; Xiangyuan Wan; He Gao; Tao Guo; Ning Su; Cailin Lei; Xin Zhang; Zhijun Cheng; Xiuping Guo; Jiulin Wang; Ling Jiang; Huqu Zhai; Jianmin Wan
Journal:  Cell Res       Date:  2008-12       Impact factor: 25.617

Review 2.  Next-generation DNA sequencing methods.

Authors:  Elaine R Mardis
Journal:  Annu Rev Genomics Hum Genet       Date:  2008       Impact factor: 8.929

3.  The novel quantitative trait locus GL3.1 controls rice grain size and yield by regulating Cyclin-T1;3.

Authors:  Peng Qi; You-Shun Lin; Xian-Jun Song; Jin-Bo Shen; Wei Huang; Jun-Xiang Shan; Mei-Zhen Zhu; Liwen Jiang; Ji-Ping Gao; Hong-Xuan Lin
Journal:  Cell Res       Date:  2012-11-13       Impact factor: 25.617

4.  EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments.

Authors:  Ning Leng; John A Dawson; James A Thomson; Victor Ruotti; Anna I Rissman; Bart M G Smits; Jill D Haag; Michael N Gould; Ron M Stewart; Christina Kendziorski
Journal:  Bioinformatics       Date:  2013-02-21       Impact factor: 6.937

5.  The ubiquitin receptor DA1 regulates seed and organ size by modulating the stability of the ubiquitin-specific protease UBP15/SOD2 in Arabidopsis.

Authors:  Liang Du; Na Li; Liangliang Chen; Yingxiu Xu; Yu Li; Yueying Zhang; Chuanyou Li; Yunhai Li
Journal:  Plant Cell       Date:  2014-02-28       Impact factor: 11.277

Review 6.  RNA-Seq: a revolutionary tool for transcriptomics.

Authors:  Zhong Wang; Mark Gerstein; Michael Snyder
Journal:  Nat Rev Genet       Date:  2009-01       Impact factor: 53.242

7.  Plant genetics. Early allopolyploid evolution in the post-Neolithic Brassica napus oilseed genome.

Authors:  Boulos Chalhoub; France Denoeud; Shengyi Liu; Isobel A P Parkin; Haibao Tang; Xiyin Wang; Julien Chiquet; Harry Belcram; Chaobo Tong; Birgit Samans; Margot Corréa; Corinne Da Silva; Jérémy Just; Cyril Falentin; Chu Shin Koh; Isabelle Le Clainche; Maria Bernard; Pascal Bento; Benjamin Noel; Karine Labadie; Adriana Alberti; Mathieu Charles; Dominique Arnaud; Hui Guo; Christian Daviaud; Salman Alamery; Kamel Jabbari; Meixia Zhao; Patrick P Edger; Houda Chelaifa; David Tack; Gilles Lassalle; Imen Mestiri; Nicolas Schnel; Marie-Christine Le Paslier; Guangyi Fan; Victor Renault; Philippe E Bayer; Agnieszka A Golicz; Sahana Manoli; Tae-Ho Lee; Vinh Ha Dinh Thi; Smahane Chalabi; Qiong Hu; Chuchuan Fan; Reece Tollenaere; Yunhai Lu; Christophe Battail; Jinxiong Shen; Christine H D Sidebottom; Xinfa Wang; Aurélie Canaguier; Aurélie Chauveau; Aurélie Bérard; Gwenaëlle Deniot; Mei Guan; Zhongsong Liu; Fengming Sun; Yong Pyo Lim; Eric Lyons; Christopher D Town; Ian Bancroft; Xiaowu Wang; Jinling Meng; Jianxin Ma; J Chris Pires; Graham J King; Dominique Brunel; Régine Delourme; Michel Renard; Jean-Marc Aury; Keith L Adams; Jacqueline Batley; Rod J Snowdon; Jorg Tost; David Edwards; Yongming Zhou; Wei Hua; Andrew G Sharpe; Andrew H Paterson; Chunyun Guan; Patrick Wincker
Journal:  Science       Date:  2014-08-21       Impact factor: 47.728

8.  Using RNA-Seq to profile soybean seed development from fertilization to maturity.

Authors:  Sarah I Jones; Lila O Vodkin
Journal:  PLoS One       Date:  2013-03-15       Impact factor: 3.240

9.  Transcriptome analysis of Brassica napus pod using RNA-Seq and identification of lipid-related candidate genes.

Authors:  Hai-Ming Xu; Xiang-Dong Kong; Fei Chen; Ji-Xiang Huang; Xiang-Yang Lou; Jian-Yi Zhao
Journal:  BMC Genomics       Date:  2015-10-24       Impact factor: 3.969

10.  Transcriptome Profiling of Resistance to Fusarium oxysporum f. sp. conglutinans in Cabbage (Brassica oleracea) Roots.

Authors:  Miaomiao Xing; Honghao Lv; Jian Ma; Donghui Xu; Hailong Li; Limei Yang; Jungen Kang; Xiaowu Wang; Zhiyuan Fang
Journal:  PLoS One       Date:  2016-02-05       Impact factor: 3.240

View more
  5 in total

1.  Transcriptome Analysis of Seed Weight Plasticity in Brassica napus.

Authors:  Javier Canales; José Verdejo; Gabriela Carrasco-Puga; Francisca M Castillo; Anita Arenas-M; Daniel F Calderini
Journal:  Int J Mol Sci       Date:  2021-04-24       Impact factor: 5.923

2.  Transcriptome analysis reveals important candidate genes involved in grain-size formation at the stage of grain enlargement in common wheat cultivar "Bainong 4199".

Authors:  Yuanyuan Guan; Gan Li; Zongli Chu; Zhengang Ru; Xiaoling Jiang; Zhaopu Wen; Guang Zhang; Yuquan Wang; Yang Zhang; Wenhui Wei
Journal:  PLoS One       Date:  2019-03-25       Impact factor: 3.240

3.  Comparative Analysis of Seed Transcriptome and Coexpression Analysis Reveal Candidate Genes for Enhancing Seed Size/Weight in Brassica juncea.

Authors:  Shikha Mathur; Kumar Paritosh; Rajesh Tandon; Deepak Pental; Akshay K Pradhan
Journal:  Front Genet       Date:  2022-02-24       Impact factor: 4.599

4.  Transcriptomic analysis of rapeseed (Brassica napus. L.) seed development in Xiangride, Qinghai Plateau, reveals how its special eco-environment results in high yield in high-altitude areas.

Authors:  Huiyan Xiong; Ruisheng Wang; Xianqing Jia; Hezhe Sun; Ruijun Duan
Journal:  Front Plant Sci       Date:  2022-08-02       Impact factor: 6.627

5.  Small RNA and degradome profiling involved in seed development and oil synthesis of Brassica napus.

Authors:  Wenhui Wei; Gan Li; Xiaoling Jiang; Yuquan Wang; Zhihui Ma; Zhipeng Niu; Zhiwei Wang; Xinxin Geng
Journal:  PLoS One       Date:  2018-10-17       Impact factor: 3.240

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.