Literature DB >> 27532666

Transcriptome Profile of Near-Isogenic Soybean Lines for β-Conglycinin α-Subunit Deficiency during Seed Maturation.

Bo Song1, Lixin An2, Yanjing Han1, Hongxiu Gao1, Hongbo Ren3, Xue Zhao1, Xiaoshuang Wei1, Hari B Krishnan4, Shanshan Liu1.   

Abstract

Crossing, backcrossing, and molecular marker-assisted background selection produced a soybean (Glycine max) near-isogenic line (cgy-2-NIL) containing the cgy-2 allele, which is responsible for the absence of the allergenic α-subunit of β-conglycinin. To identify α-null-related transcriptional changes, the gene expressions of cgy-2-NIL and its recurrent parent DN47 were compared using Illumina high-throughput RNA-sequencing of samples at 25, 35, 50, and 55 days after flowering (DAF). Seeds at 18 DAF served as the control. Comparison of the transcript profiles identified 3,543 differentially expressed genes (DEGs) between the two genotypes, with 2,193 genes downregulated and 1,350 genes upregulated. The largest numbers of DEGs were identified at 55 DAF. The DEGs identified at 25 DAF represented a unique pattern of GO category distributions. KEGG pathway analyses identified 541 altered metabolic pathways in cgy-2-NIL. At 18DAF, 12 DEGs were involved in arginine and proline metabolism. The cgy-2 allele in the homozygous form modified the expression of several Cupin allergen genes. The cgy-2 allele is an alteration of a functional allele that is closely related to soybean protein amino acid quality, and is useful for hypoallergenic soybean breeding programs that aim to improve seed protein quality.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 27532666      PMCID: PMC4988716          DOI: 10.1371/journal.pone.0159723

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Soy-seed-derived products and their nutritional quality are affected by the subunit composition of seed storage proteins [1-4]. Glycinin (11S globulin) and β-conglycinin (7S) are the two main proteins in soybean seeds, accounting for ~70% of total seed proteins. By manipulating the identified variant alleles of glycinin and β-conglycinin, it is possible to breed soybean varieties with modified protein compositions, ranging from extremely high to extremely low 11S:7S ratios, which have led to improved nutritional values and food-processing properties [1,5-6]. In the past three decades, efforts to develop 7S-low-type soybean lines have led to the availability of various 7S or 11S globulin protein subunit null varieties among soybean germplasms [1,7-12]. Despite the deficiency of 7S and 11S major protein subunits, the nitrogen content of the mutant dry seeds is similar to (or higher than) wild-type cultivars, and most mutants grow and reproduce normally [2]. Specifically, β-conglycinin allergen-subunit-deficiency mutants have high nutritional value and low allergenic risk [1,5-6,13-14]. β-Conglycinin is the major seed protein of soybean (Glycine max (L.) Merr.), and comprises three subunits: α′ (76 kDa), α (72 kDa), and β (52 kDa) in varying proportion [15]. β-Conglycinin contains lower amounts of sulfur-containing amino acids and has a reduced gel-forming ability than glycinin [16]. Specifically, the α and α′-subunits of β-conglycinin negatively influence the nutrition of seed proteins and the gelation of tofu [1,5,12,17]. In addition, the three subunits are major allergens [5,17-20]. Genetic studies demonstrated that the absence of the α-subunit is controlled by a single recessive α-null allele, cgy-2 [21-23]. Gene symbols Cgy2/cgy2 were proposed for the genes that confer the presence or absence of the α-subunit of soybean β-conglycinin [21-22]. To date, the genetic effect of the α-null mutation and the molecular mechanism of cgy-2 allele variation remain unclear. The transcriptome corresponding to most of the protein coding genes is a small but important representation of the genome. Recently, RNA sequencing (RNA-seq) technologies have been developed that offer an opportunity to deliver fast, cost-effective, and accurate means to analyze the transcriptome in non-model organisms. With advances in RNA-seq, a large number of molecular markers and transcripts involved in specific biological processes could be identified. In soybean, transcriptome analyses of gene expression profiles during soybean seed development have been conducted mainly using microarray analysis and RNA-seq technology [24-27]. By utilizing DNA microarray analysis, Narikawa et al. [24] verified the changes in seed metabolism in the glycinin-null cultivar Tousan 205. Tousan 205 exhibited higher expression levels of stress-related genes, such as ascorbate peroxidase, than its parent cultivar ‘Tamahomare’. Their results suggested that the deficiency of glycinin caused an expression change of stress-related genes. In contrast to the Cgy-2 allele (conferring α-normal), information on the cgy-2 allele (conferring α-null) is limited. In the present study, we have examined the effect of cgy-2 allele on the amino acid composition and gene expression. The information generated from this study will be valuable to soybean breeders involved in the modification of soybean seed protein composition.

Materials and Methods

Plant materials

Near-isogenic line (NIL) cgy-2-NIL, carrying the cyg-2 allele (conferring α-null) (Fig 1), used for this study were derived from an α-subunit-null population, which has previously been used by our group to develop α-subunit-null improved lines with a Chinese soybean genetic background [12].
Fig 1

(A) Breeding process of the ‘cgy-2-NIL’. RPGR = recurrent parent genome recovery. Discontinuous lines link the last generation where the definitive near isogenic lines (NILs) were selected. Numbers alongside discontinuous lines show how many lines were obtained in each candidate line. (B) Graphical genotype for chromosome 20 in ‘cgy-2-NIL’ with the ‘DN47’ genetic background. (C) SDS-PAGE analysis of mature seed proteins of DN47 and four cgy-2-NIL lines: B-12088-3, B-12088-6, B-12088-8, and B-12088-12. (D) Immunoblot analysis of the seed extracts shown in (C) using antibodies specific for β-conglycinin subunits. The sizes of the protein markers (M) in kilodaltons are shown on the left of the image in (C). The arrows point to the α′, α, and β-subunits of β-conglycinin in Fig (C) and (D).

(A) Breeding process of the ‘cgy-2-NIL’. RPGR = recurrent parent genome recovery. Discontinuous lines link the last generation where the definitive near isogenic lines (NILs) were selected. Numbers alongside discontinuous lines show how many lines were obtained in each candidate line. (B) Graphical genotype for chromosome 20 in ‘cgy-2-NIL’ with the ‘DN47’ genetic background. (C) SDS-PAGE analysis of mature seed proteins of DN47 and four cgy-2-NIL lines: B-12088-3, B-12088-6, B-12088-8, and B-12088-12. (D) Immunoblot analysis of the seed extracts shown in (C) using antibodies specific for β-conglycinin subunits. The sizes of the protein markers (M) in kilodaltons are shown on the left of the image in (C). The arrows point to the α′, α, and β-subunits of β-conglycinin in Fig (C) and (D). About 45 days after sowing, fully expanded flowers were marked individually with a tag at the 4th, 5th, 6th, or 7th nodes on cgy-2-NIL and DN47 (Fig 2A). Pod samples were collected during seed development at 15, 18, 20, 25, 30, 35, 40, 45, 50, 55, and 60 days after flowering (DAF, Fig 2B) during the summer of 2014. All seed samples (BC4F5) (combined cotyledon and seed coat) of a given age were pooled and stored at −80°C for future use (Fig 2C). Unusual-sized seeds were excluded from the soybean samples. Based on the assessment of the different expressions of the α-subunit gene between the cgy-2-NIL and DN47 by quantitative real-time reverse transcription PCR (qRT-PCR) (Fig 2D), five stages of soybean seeds collected at 18, 25, 35, 50, and 55 DAF were finally selected for RNA-seq analysis.
Fig 2

(A) Fully expanded flowers were marked individually with a tag at the 4th, 5th, 6th, or 7th nodes on cgy-2-NIL and DN47. (B) Pod samples were collected during seed development at 15, 18, 20, 25, 30, 35, 40, 45, 50, 55, and 60 days after flowering (DAF). (C) Developmental changes in morphology, size, and weight of sampled seeds in cgy-2-NIL and DN47. (D) Comparison of transcript levels of the α-subunit gene between cgy-2-NIL and DN47. Differential expression of more than 50-fold was identified during the 25–55 days after flowering (DAF) development stages. Five stages of soybean seeds collected at 18, 25, 35, 50, and 55 DAF were finally selected to subjected to RNA-seq.

(A) Fully expanded flowers were marked individually with a tag at the 4th, 5th, 6th, or 7th nodes on cgy-2-NIL and DN47. (B) Pod samples were collected during seed development at 15, 18, 20, 25, 30, 35, 40, 45, 50, 55, and 60 days after flowering (DAF). (C) Developmental changes in morphology, size, and weight of sampled seeds in cgy-2-NIL and DN47. (D) Comparison of transcript levels of the α-subunit gene between cgy-2-NIL and DN47. Differential expression of more than 50-fold was identified during the 25–55 days after flowering (DAF) development stages. Five stages of soybean seeds collected at 18, 25, 35, 50, and 55 DAF were finally selected to subjected to RNA-seq.

SDS-PAGE and immunoblot analysis

Sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) and immunoblot analysis were performed as described earlier [20, 28]. Briefly, total seed proteins (25 μg) from DN47 and cgy-2-NIL lines were resolved on 10% polyacrylamide gels. Separated proteins were visualized by staining with Coomassie brilliant blue or electrophoretically transferred to a nitrocellulose membrane and incubated with polyclonal antibodies raised against the α-subunit of β-conglycinin. Immunoreactive proteins were detected using an anti-rabbit IgG-horseradish peroxidase conjugate followed by chemiluminescent detection.

Determination of seed protein content and amino acid analysis of ‘cgy-2-NILs’

Dry seeds of DN47 and cgy-2-NIL were harvested at maturity in 2014 and stored at room temperature. Ten plants of each cgy-2-NIL and DN47 were examined. Total seed nitrogen was measured using the Micro-Kjeldahl method (Foss, 2300 Kjeltec Analyzer Unit). The crude protein content was determined by calculating the nitrogen content and then multiplying the result by a conversion factor of 6.25. Total amino acids (AAs) were obtained from hydrolysis of seed meal in 6 M HCl for 22 h in sealed evacuated tubes at a constant boiling temperature of 110°C. An amino acid analyzer (Hitachi L-8800; Hitachi, Tokyo, Japan) was used to determine the AA composition of the hydrolysates. Free AAs were extracted from 5.00 g of seed meal. Seed meal (seeds were sampled using a sample quartiles method, fully dried with mill grinding through a 0.25-mm sieve, and thoroughly mixed) was finely homogenized in 30 mL of sulfosalicylic acid (10 g per 100 mL) and disrupted ultrasonically for 30 min. The supernatant was centrifuged at 5000 × g for 5 min. The resultant supernatant was filtered through a 22-μm GD/X sterile disposable syringe filter. A Hitachi L-8800 amino acid analyzer was then used to analyze the filtrate. The amino acid quality was compared between cgy-2-NIL and DN47 using a scoring method. The amino acid score (AAS) was calculated according to the scoring pattern suggested by the Food and Agriculture Organization and World Health Organization (FAO/WHO) [29]. Concentration was expressed as grams of amino acid/16 gN in the test protein divided by grams of amino acid/16 gN in the scoring pattern. Each data set and reference patterns were also used to calculate EAAI (essential amino acid index) [30, 31]. The EAAI is the geometric mean of the individual amino acid scores and is equal to the antilogarithm of the individual scores. The AAS was calculated using the following formula: The EAAI values were assigned a maximum of 1.00 and a minimum of 0.01. Feedstuffs are rated as good-quality protein sources when the EAAI is ≥ 0.90, adequate when approximately 0.80, and inadequate below 0.70 [32].

RNA isolation, cDNA library construction, and Illumina deep sequencing

Seed samples harvested at five growth stages corresponding to 18, 25, 35, 50, and 55 DAF from DN47 and cgy-2-NIL in the summer of 2014 were used for RNA-seq analysis. Two individual biological replicates were tested for the five developmental stages, resulting in 20 samples. In order to minimize biological variation, RNA from separate biological samples was used for the two biological replicates per stage, the values of correlation coefficient (R2 value) of all DEGs for each 2 biological replicates ranged from 0.961 to 0.992. All of the samples were stored in liquid nitrogen immediately after collection in the field and then transported to a −80°C freezer in our laboratory at the Northeast Agriculture University soybean research center. Total RNA was extracted from each sample using the improved cetyl trimethylammonium bromide method [33]. RNA degradation and contamination was monitored on 1% agarose gels. A NanoPhotometer spectrophotometer (Implen, CA, USA) was used to check the RNA purity. A Qubit RNA Assay Kit in a Qubit 2.0 Fluorometer (Life Technologies, CA, USA) was used to measure the RNA concentration and an RNA Nano 6000 Assay Kit of the Bioanalyzer 2100 system (Agilent Technologies, CA, USA) was utilized to assess the RNA integrity. All RNA samples had RNA integrity number (RIN) values above 6.5. mRNA was extracted using Dynabeads oligo (dT) (Dynal; Invitrogen). Double-stranded cDNAs were synthesized using reverse transcriptase (Superscript II; Invitrogen) and random hexamer primers. To select preferentially cDNA fragments of 200 bp in length, the library fragments were purified using the AMPure XP system (Beckman Coulter, Beverly, CA, USA). DNA fragments with ligated adaptor molecules on both ends were enriched selectively using the Illumina PCR Primer Cocktail in a 10-cycle PCR reaction. Products were purified using the AMPure XP system and quantified using the Agilent high-sensitivity DNA assay on the Agilent Bioanalyzer 2100 system. cDNA Library concentration was first quantified using a Qubit 2.0 fluorometer (Life Technologies), and then diluted to 1 ng/μl before checking insert size on an Agilent 2100 and quantifying to greater accuracy by quantitative PCR (Q-PCR) (library activity >2 nM).The library preparations were sequenced on an Illumina Hiseq 2000 platform and 100-bp paired-end reads were generated. Illumina sequencing was performed by Novogene Bioinformatics Technology Co., Ltd., Beijing, China (www.novogene.cn).

Bioinformatic analysis of differentially expressed genes (DEGs)

To obtain high-quality clean reads, raw data (raw reads) in fastq format were first processed using in-house Perl scripts. The calculation of Q20, Q30, GC-content, and all the downstream analyses were based on the high-quality clean data. The reference genome (ftp://ftp.ensemblgenomes.org/pub/release-23/plants/fasta/glycine_max/) and gene model annotation files were downloaded from the genome website directly. We used HTSeq v 0.6.1 (www-huber.embl.de/users/anders/HTSeq/) to count the read numbers mapped to each gene. Data were then provided in reads per kilobase per million reads (RPKMs) [34]. Differential expression analysis was performed using the DESeqR package (1.10.1) [35]. P-values were adjusted using the Benjamini and Hochberg approach; with a P-value < 0.05 being used as the threshold for significant differential expression. Gene Ontology (GO) (http://www.geneontology.org/) analysis was performed by the GOseq R package [36], and GO terms with corrected P-values < 0.05 were considered significantly enriched for the DEGs. We used KOBAS software (KOBAS, Surrey, UK) to test the statistical enrichment of DEGs in KEGG pathways (http://www.kegg.jp/kegg/pathway.html). Datasets were deposited in the GEO (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=qpwjwcsqjhahzed&acc=GSE79327) with accession number GSE79327.

qRT-PCR confirmation of the Illumina sequencing data

RNA-seq data were further validated using qRT-PCR for six selected genes, using gene-specific primer sets (Fig 3A). Primer pairs were designed using the Primer 5 software. Actin was amplified along with the target gene as an endogenous control to normalize expression between different samples. qRT-PCR was performed using a real-time RT-PCR kit (Takara, Japan), on a CFX96 Real-Time System (BioRad, USA). The delta-delta-cycle threshold (Ct) method was used to calculate the relative expression of each mRNA [37].
Fig 3

(A) Primer sequences used in quantitative real-time reverse transcription PCR (qRT-PCR) analysis for validation of the expressed genes in Illumina sequencing. (B) Comparison between the gene expression ratios obtained from RNA-seq data and that from qRT-PCR. The RNA-seq log2 value of the expression ratio (y-axis) was plotted against the developmental stages (x-axis). (C) qRT-PCR analysis of differentially expressed genes between cgy-2-NIL and DN47. The transcript abundance from the RNA-seq data is shown at the top of the panel for each gene. RPKM: reads per kilobase per million reads. DAF = days after flowering.

(A) Primer sequences used in quantitative real-time reverse transcription PCR (qRT-PCR) analysis for validation of the expressed genes in Illumina sequencing. (B) Comparison between the gene expression ratios obtained from RNA-seq data and that from qRT-PCR. The RNA-seq log2 value of the expression ratio (y-axis) was plotted against the developmental stages (x-axis). (C) qRT-PCR analysis of differentially expressed genes between cgy-2-NIL and DN47. The transcript abundance from the RNA-seq data is shown at the top of the panel for each gene. RPKM: reads per kilobase per million reads. DAF = days after flowering.

Results

Phenotype screening for α-subunit nulls using SDS-PAGE and immunoblot analysis

The cgy-2-NILs were derived as outlined in Fig 1A. In the BC3F2 population [12], three individuals, B12038, B12040, and B12088, with recurrent parent genome recoveries of 98.47%, 98.98%, and 99.49%, respectively, were selected as α-null donor parents. BC4F2 progeny derived from these selected BC3F2 parents with the homozygous cgy-2 gene, were selfed to obtain 69 BC4F3:4 individuals, which were designated as pre-ILs, including three sets: 24 lines for B12038, 19 lines for B12040, and 26 lines for B12088 (Fig 1A). These pre-ILs were genotyped again using polymorphic molecular markers for verification. Uniformity regarding plant type was also examined within each line. Seventeen lines containing only a single introgression in chromosome 20 containing the cgy-2 gene in B12088 progeny were obtained (Fig 1B). Combined with stringent phenotypic selection, a final collection of four ideal NILs was obtained, with each NIL containing the cgy-2 allele (hereafter named ‘cgy-2-NIL’) (Fig 1A). BC4F5 seeds obtained in 2014 from the above experiments were used for all subsequent experiments in this study. The recurrent parent DongNong 47 (DN47), containing all storage protein subunits, was used as the control (Fig 1C, lane 1). cgy-2-NIL only differed from the DN47 control by its introgressed donor DNA fragment containing the cgy-2 gene (Fig 1B). Therefore, any observable phenotypic differences are expected to be a result of the cgy-2 gene. The absence α-subunit of β-conglycinin in the developed cgy-2-NIL was verified by SDS-PAGE (Fig 1C) and immunoblot analysis (Fig 1D). An examination of the total seed protein profile of DN47 revealed the presence of all three subunits of β-conglycinin (Fig 1C). In contrast, the cgy-2-NILs (12088–3, 12088–6, 12088–8, and 12088–12) failed to accumulate the 72-kDa α subunit of β-conglycinin (Fig 1C). This observation was further confirmed by western blot analysis (Fig 1D). β-conglycinin-specific antibodies recognized all the three subunits of β-conglycinin from DN47, while the cgy-2-NILs showed no reactivity against the 72-kDa α subunit of β-conglycinin. This observation confirms that the α subunit of β-conglycinin is absent in the cgy-2-NILs.

Effect of allelic variation of the α-subunit locus on soybean amino acid composition

To understand the effect of allergen-α-subunit-deficiency on soybean amino acids (AAs) composition, the AA content and nutritional quality were investigated. The crude protein content, AA concentration, and free amino acids (FAA) concentrations of the homozygous cgy-2-NIL and the recurrent parent DN47 were compared. In cgy-2-NIL compared with DN47, there was a 4.11%, 4.16%, 5.20%, and 11.96% increase in crude protein content, total AA content, total essential amino acid (TEAA) content, and sulfur-containing (Met and Cys) content, respectively (Table 1). The concentration of Thr, Val, Met, and Ile increased significantly in cgy-2-NIL, resulting in a significant increase in TEAA content. The sulfur-containing (Met and Cys) AA concentration increased significantly in cgy-2-NIL. The total AA concentration also increased in cgy-2-NIL because of the general increase in constituent content of most AAs (Table 1).
Table 1

Comparison of amino acid and free amino acid contents of mature seeds between ‘DN47’ and its near-isogenic line,‘cgy-2 NIL’.

A.A.F.A.A.
D47 (%)NIL (%)D47 (mg/g)NIL (mg/g)
Essential amino acids
Thr1.38 ± 0.091.54 ± 0.03*0.2127 ± 0.01150.2783 ± 0.0084*
Val1.48 ± 0.041.55 ± 0.02*0.1007 ± 0.01670.1043 ± 0.0032
Met0.40 ± 0.060.52 ± 0.01*0.0660 ± 0.00350.0840 ± 0.0010*
Ile1.48 ± 0.061.58 ± 0.02*0.0720 ± 0.00690.0707 ± 0.0029
Leu2.71 ± 0.052.80 ± 0.040.1133 ± 0.00980.1510 ± 0.0046*
Phe1.80 ± 0.041.79 ± 0.030.1480 ± 0.00170.1677 ± 0.0025*
Lys2.29 ± 0.032.36 ± 0.040.2287 ± 0.02710.2537 ± 0.0012
T.E.A.A.11.53 ± 0.2412.13 ± 0.16*0.9413 ± 0.05771.1097 ± 0.0015*
Non-essential amino acids
Asp3.85 ± 0.124.06 ± 0.04*0.7033 ± 0.00460.6263 ± 0.0381*
Ser1.81 ± 0.061.90 ± 0.020.0825 ± 0.00090.1133 ± 0.0068*
Glu5.93 ± 0.135.87 ± 0.040.5113 ± 0.01440.5020 ± 0.0156
Gly1.45 ± 0.021.53 ± 0.03*0.0633 ± 0.00580.0913 ± 0.0006*
Ala1.47 ± 0.031.58 ± 0.02*0.0940 ± 0.00520.1663 ± 0.0045*
Cys0.52 ± 0.010.51 ± 0.010.1873 ± 0.00230.1797 ± 0.0075
Tyr1.14 ± 0.031.19 ± 0.030.0873 ± 0.00230.1220 ± 0.0070*
His0.89 ± 0.020.99 ± 0.01*0.1367 ± 0.02890.2873 ± 0.0452*
Arg2.36 ± 0.082.58 ± 0.02*0.7547 ± 0.02141.9817 ± 0.3537*
Pro1.77 ± 0.071.75 ± 0.020.1450 ± 0.03640.2043 ± 0.0015*
T.A.A32.72 ± 0.4134.08 ± 0.37*3.7128 ± 0.06735.3837 ± 0.4481*
T.S.A.A.0.92 ± 0.051.03 ± 0.01*0.2533 ± 0.00580.2637 ± 0.0065
Protein37.96 ± 0.3939.52 ± 0.28*

Data are means ± SD for seeds from at least three plants. Asterisks indicate statistically significant differences (*P < 0.05) between ‘DN47’ and ‘NIL-DN47-Δα’. Each amino acid is expressed using its three-letter abbreviation.

A.A.: amino acid; FAA: free amino acid; NIL: near-isogenic line’; T.A.A.: total amino acids; T.S.A.A: total sulfur-containing amino acids (Met+Cys).

Data are means ± SD for seeds from at least three plants. Asterisks indicate statistically significant differences (*P < 0.05) between ‘DN47’ and ‘NIL-DN47-Δα’. Each amino acid is expressed using its three-letter abbreviation. A.A.: amino acid; FAA: free amino acid; NIL: near-isogenic line’; T.A.A.: total amino acids; T.S.A.A: total sulfur-containing amino acids (Met+Cys). The increased content of FAAs in cgy-2-NIL was most pronounced for Arg, which increased by more than two-fold compared with DN47 (Table 1). In cgy-2-NIL, Arg comprised 36.81% of FAAs, with Asp and Glu providing a further 11.63% and 9.32%, respectively; the remaining 42.24% comprised various other FAAs. His concentration also increased by two-fold in cgy-2-NIL; however, its content was much lower than Arg. The general and significant increase in the constituent content of most FAAs resulted in a significant increase in the total essential FAA and total FAA contents (Table 1). The amino acid score was calculated according to the scoring pattern suggested by the FAO/WHO [29]. Both the total EAA content and the EAAI of cgy-2-NIL were higher than that of DN47 (Table 2). Our results suggested that the null allele of α-subunit positively affected the AA scores.
Table 2

Amino acid (A.A.) profile of mature seeds in ‘DN47’ and its near-isogenic line, ‘cgy-2 NIL’.

A.A.FAOmg/gPro.D47NIL (α-null)
mg/gPro.A.A. Sco. (%)mg/gPro.A.A.Sco. (%)
Essential amino acids
Thr4036.2790.6738.8897.21
Val5038.9077.8039.3078.61
Met+Cys3524.1568.9925.9874.22
Ile4038.9097.2539.9899.95
Leu7071.39101.9970.85101.21
Phe+Tyr6077.45129.0875.40125.67
Lys5560.33109.6859.63108.42
Trp10-Not determined-
TEAA360347.38350.03
EAAI(%)10079.2582.16
Non-essential amino acids
Asp101.42102.73
Ser47.6848.08
Glu156.13148.45
Gly38.2938.63
Ala38.6439.90
His23.5325.13
Arg62.0865.20
Pro46.7244.28
TAA861.87862.43

Data are expressed as means of triplicate experiments. Each amino acid is expressed using the three-letter code. A.A.: amino acid; Pro.: protein; A.A. Sco.: amino acid score; TEAA: total essential amino acids; EAAI: essential amino acid index; NIL: near-isogenic line cgy-2-NIL; TAA: total amino acids.

Data are expressed as means of triplicate experiments. Each amino acid is expressed using the three-letter code. A.A.: amino acid; Pro.: protein; A.A. Sco.: amino acid score; TEAA: total essential amino acids; EAAI: essential amino acid index; NIL: near-isogenic line cgy-2-NIL; TAA: total amino acids.

DEGs between ‘cgy-2-NIL’ and ‘DN47’

One of the primary goals of transcriptome sequencing is to compare the gene expression levels in two genotypes. A P-value < 0.05 and log2 (fold change) > 2 were used as the thresholds to judge the significant differences (enriched or depleted) in the gene expression profiles between cgy-2-NIL and DN47 at the same stage. Using these criteria, 20,295 DEGs were identified, which could be subdivided into 174, 151, 123, 158, and 2837 genes that varied in abundance at 18, 25, 35, 50, and 55 DAF, respectively (Fig 4). In general, throughout the five seed development stages, the total number of upregulated genes was less than the number of downregulated genes (Fig 4). Surprisingly, the maximum number of DEGs between cgy-2-NIL and DN47 was observed at 55 DAF. Furthermore, different from the other three stages (18, 50, and 55 DAF), there were more upregulated DEGs than downregulated DEGs at 25 and 35 DAF.
Fig 4

The number of differentially expressed genes (DEGs) between ‘cgy-2-NIL’ and ‘DN47’ at various developmental stages (18, 25, 35, 50, and 55 days after flowering (DAF)).

Numbers of up and downregulated genes are summarized.

The number of differentially expressed genes (DEGs) between ‘cgy-2-NIL’ and ‘DN47’ at various developmental stages (18, 25, 35, 50, and 55 days after flowering (DAF)).

Numbers of up and downregulated genes are summarized. To determine whether these gene expression profiles correlated with development stages, the RNA-seq data of the cgy-2-NIL and DN47 were subjected to hierarchical clustering analysis using the ‘H-clust (1.10.1)’ function (Fig 5). The samples were clustered together based on genes that showed similar expression patterns. Genes expressed at the same stage both in cgy-2-NIL and DN47 were clustered together in all cases. The clusters of 18 DAF and 25 DAF seeds, and 35 DAF and 50 DAF seeds were very closely positioned, respectively. The 55 DAF cluster was closest to the 35 and 50 DAF clusters, and the 18 and 25 DAF clusters were farthest from the other three clusters (Fig 5). The greatest changes in gene expression were seen between the 25 and 55 DAF clusters. Notably, the developmental order was broken by 55 DAF; neighboring stages did not cluster together in the same order as development.
Fig 5

Cluster dendrogram of differentially expressed genes in cgy-2 NIL and ‘DN47’ at five developmental stages (18, 25, 35, 50, and 55 days after flowering).

The comparison among different development stages between cgy-2-NIL and DN47 is shown in Fig 6. The majority of DEGs showed development-stage-specific expression. Seventeen DEGs were differentially expressed in all five stages. As shown in Table 3, among these 17 genes, only one signal transduction response regulator gene (Glyma11g15580) was upregulated during seed development. Five DEGs (Glyma02g41810, Glyma03g02370, Glyma04g41540, Glyma09g02600, and Glyma15g06160) were downregulated during seed development. The other 12 DEGs were differentially expressed among different developmental stages in both cgy-2-NIL and DN47, and exhibited two different expression patterns: one group, including two RNA recognition motif domain proteins (Glyma12g01350 and Glyma12g04710), one transcription factor MYC/MYB N-terminal (Glyma11g18290,) and one Ferritin-conserved site (Glyma01g31300) were upregulated in cgy-2-NIL only at 18 DAF, and were then downregulated in all subsequent stages. The other group, including Glyma02g04840, Glyma08g16310, Glyma11g25660, Glyma12g13920, Glyma20g16100, and Novel 100599, were upregulated in cgy-2-NIL at 25 DAF and 35 DAF, and downregulated at 18, 50, and 55 DAF.
Fig 6

Venn diagram showing the overlap of identified differentially expressed genes (DEGs) between ‘cgy-2NIL’ and ‘DN47’ at 18, 25, 35, 50, and 55 days after flowering.

Table 3

Seventeen genes with altered expression between ‘cgy-2 NIL’ and ‘DN47’ across five developmental stages.

NO.DEG IDLog2-fold changeDESCRIPTION
18 DAF25 DAF35 DAF50 DAF55 DAF
1GLYMA01G313000.791−1.889−1.969−1.590−3.658Ferritin, conserved site||Ferritin||Ferritin- like di-iron domain||Ferritin-related||Ferritin/DPS protein domain||Ferritin/ribonucleotide reductase-like
2GLYMA02G04840−2.1902.2552.190−4.465−2.875Protein of unknown function DUF241, plant
3GLYMA02G41810−1.321−1.125−0.759−0.989−1.103Regulator of chromosome condensation, RCC1||Regulator of chromosome condensation 1/beta-lactamase-inhibitor protein II
4GLYMA03G02370−0.632−0.646−1.399−1.554−0.826C2 calcium-dependent membrane targeting
5GLYMA04G41540−0.373−0.886−0.826−0.837−2.937Glutamate synthase, NADH/NADPH, small subunit 1||"Glutamate synthase, alpha subunit, C-terminal"||"Glutamate synthase, central-C"
6GLYMA08G16310−0.7512.0111.320−2.468−1.264-
7GLYMA09G02600−0.673−1.084−0.929−2.871−3.122Haem peroxidase, plant/fungal/bacterial||Haem peroxidase||Peroxidases haem-ligand binding site||Plant peroxidase
8GLYMA11G155800.6030.7160.7401.6071.567Signal transduction response regulator, receiver domain||CheY-like superfamily
9GLYMA11G182900.327−0.524−0.584−1.232−1.694Myc-type, basic helix-loop-helix (bHLH) domain||Transcription factor MYC/MYB N-terminal
10GLYMA11G25660−3.2181.6371.709−4.280−3.175EF-Hand 1, calcium-binding site||Calcium-binding EF-hand||EF-hand-like domain
11GLYMA12G013500.272−0.661−0.793−1.823−2.004Nucleotide-binding, alpha-beta plait||"Zinc finger, CCCH-type"||RNA recognition motif domain
12GLYMA12G047010.348−1.217−0.846−1.019−1.247Nucleotide-binding, alpha-beta plait||RNA recognition motif domain
13GLYMA12G13920−2.0611.1521.721−1.622−0.980Glutaredoxin-like, plant II||Glutaredoxin||Thioredoxin-like fold
14GLYMA15G06160−0.371−0.481−0.660−1.230−0.774Pseudouridine synthase, catalytic domain||Dyskerin-like||PUA-like domain||Pseudouridine synthase II
15GLYMA16G07750−1.014−0.947−0.793−1.497−1.077-
16GLYMA20G16100−0.5021.0010.859−1.231−1.507Development/cell death domain
17Novel00599−3.6252.3782.300-−2.038
The top 20 genes that showed high-level differential expression related to the α-null mutation were ranked and are shown in Table 4. Glyma17g34220 (encoding alpha crystalline) and another six genes (Glyma13g11840, Glyma13g1189, Glyma13g11961, Glyma13g12033, Novel00815, and Novel01348), which were not annotated, were all downregulated at 18 DAF. Expression of Glyma13g11840 (no annotation) was downregulated by 12.16-fold, which was the most highly downregulated of all the DEGs identified in our data. Glyma20g28660 showed the highest differential expression related to α-null at 25, 35, and 50 DAF, and was downregulated by 9.32-fold at 25 DAF, 9.49-fold at 35 DAF, and 8.8-fold at 50 DAF. At 55 DAF, two genes (Glyma06g02690 and Glyma04g02660) encoding a Gibberellin-regulated protein and two genes (Glyma10g39760 and Glyma07g40110) encoding Concanavalin A-like lectins were downregulated by 9.03-fold and 7.91-fold, and by 8.47-fold and 6.93-fold, respectively. In addition, another four genes (Novel01985, Novel02415, Glyma15g10450, and Glyma16g03600) were upregulated at 55 DAF. Expression of Glyma15g10450, encoding a protein arginine N-methyltransferase, was upregulated by 8.05-fold; and Glyma16g03600, encoding an aminotransferase that takes part in cysteine and methionine metabolism, was upregulated by 7.56-fold. We hypothesized that these genes are putativelyα-null-related transcripts. Based on obtained DEGs information and bioinformatics, we will conduct further studies focused on gene function identification of the above-mentioned DEGs.
Table 4

The top 20 genes showing high-level differential expression related to the α-null mutation.

DEG IDStage(DAF)Down/Up regulationlog2-fold changeKEGGDescription
GLYMA13G1184018Down−12.162-
GLYMA13G1189518Down−8.0568-
GLYMA13G1196118Down−7.4505
GLYMA13G1203318Down−9.3913-
GLYMA17G3422018Down−8.145Protein processing in endoplasmic reticulumAlpha crystallin/Hsp20 domain||HSP20-like chaperone
Novel0081518Down−7.0188
Novel0134818Down−7.7437
GLYMA20G2866025Down−9.32
GLYMA20G2866035Down−9.4945Cupin 1||RmlC-like cupin domain||RmlC-like jelly roll fold
GLYMA20G2866050Down−8.8458Cupin 1||RmlC-like cupin domain||RmlC-like jelly roll fold
GLYMA15G1045055Up8.0507Protein arginine N-methyltransferase||S-adenosyl-L-methionine-dependent methyltransferase-like
GLYMA16G0360055Up7.5585Cysteine and methionine metabolismAminotransferase, class I/class II||"Aminotransferases, class I, pyridoxal-phosphate-binding site"||"Pyridoxal phosphate-dependent transferase, major region, subdomain 1"
Novel0198555Up8.6001
Novel0241555Up8.0925
GLYMA04G0266055Down−7.914Gibberellin-regulated protein
GLYMA06G0269055Down−9.0289Gibberellin-regulated protein
GLYMA07G3922055Down−7.0642Petal formation expressed
GLYMA07G4011055Down−6.9337Concanavalin A-like lectin/glucanase, subgroup||"Protein kinase, ATP binding site"||"Protein kinase, catalytic domain"||"Serine/threonine-protein kinase, active site"
GLYMA10G3976055Down−8.4718Concanavalin A-like lectin/glucanase, subgroup||"Glycoside hydrolase, family 16"||"Glycoside hydrolase, family 16, active site"||"Xyloglucan endo-transglycosylase, C-terminal"
GLYMA15G1467555Down−7.4518-

DEG = differentially expressed gene; DAF = days after flowering

DEG = differentially expressed gene; DAF = days after flowering

Functional annotation and pathway assignment

GO analysis was used to annotate the identified significant DEGs between cgy-2-NIL and DN47. Three main categories, biological process, molecular function, and cellular component, in developing seeds of cgy-2-NIL vs. DN47 at five stages (18, 25, 35, 50, and 55 DAF) are shown in Table 5. GO category enrichment analysis (P-value < 0.05) revealed different results in different stages. A similar GO category distribution pattern of transcripts was found at 18, 35, and 55 DAF (Table 5). For the biological process function, eight categories were identified, and the maximum number of DEGs was associated with the term ‘biosynthetic process’ at 18, 35, and 55 DAF. Eleven categories were identified as ‘cellular component’, and the terms ‘cellular component’ (18.89%, 19.95%, 22.26%), ‘cell’ (12.48%, 13.07%, 13.01%), and ‘cell part’ (12.48%, 13.07%, 13.01%) were the most abundant at 18, 35, and 55 DAF, respectively. In terms of molecular function, the most abundant DEGs were involved in structural molecular activity (64.41%, 58.10%, 58.36%) and structural constituents of ribosome (35.39%, 41.90%, 41.64%) at 18, 35, and 55 DAF, respectively. However, the GO category distributions of the transcripts at 25 and 50 DAF were quite different (Table 5). Through alignment with KEGG database, 6627 unigenes were annotated to 37 terms of GO classification at 25 DAF. Among these groups, ‘various biosynthetic process’ and ‘regulation of various metabolic process’ were dominant within the ‘biological process’ category. Only ‘apoplast’ was detected in the ‘cellular component’ category, and ‘ion binding’, ‘purine ribonucleoside triphosphate binding’, and ‘ATP binding’ were dominant in the molecular function category (Table 5) at 25 DAF. In addition, at 50 DAF, only seven terms belonging to ‘cellular component’ were annotated by GO category enrichment analysis.
Table 5

Summary of Gene Ontology (GO) terms for differentially expressed genes (DEGs) at different developmental stages (P < 0.05).

GO terms cgy-2NILvs. DN47DESCRIPTIONNumber of DEGs
18DAF25DAF35DAF50DAF55DAF
Biological processbiosynthetic process5977602399
organic substance biosynthetic process5777272290
cellular biosynthetic process5727152232
cellular macromolecule biosynthetic process4745791752
macromolecule biosynthetic process4765821764
gene expression4555501650
single-organism carbohydrate catabolic process27163455
translation158145342
organic cyclic compound biosynthetic process283
RNA metabolic process277
cellular nitrogen compound biosynthetic process275
heterocycle biosynthetic process274
aromatic compound biosynthetic process269
nucleobase-containing compound biosynthetic process256
RNA biosynthetic process240
transcription, DNA-dependent232
regulation of metabolic process225
regulation of cellular metabolic process214
regulation of macromolecule metabolic process213
regulation of primary metabolic process212
regulation of cellular biosynthetic process210
regulation of biosynthetic process210
regulation of macromolecule biosynthetic process210
regulation of cellular biosynthetic process210
regulation of nucleobase-containing compound metabolic process209
regulation of nitrogen compound metabolic process209
regulation of gene expression208
regulation of transcription, DNA-dependent204
regulation of RNA metabolic process204
regulation of RNA biosynthetic process204
cellular component movement44
microtubule-based process41
microtubule-based movement33
cellular glucan metabolic process22
glucan metabolic process22
Cellular componentcellular_component90213684164
cell5968962435
cell part5968962435
intracellular5738462305
intracellular part5187952143
intracellular organelle4165911654
macromolecular complex3475061238
cytoplasm2943991054
cytoplasmic part244305754
ribonucleoprotein complex153139296
ribosome137117232
apoplast12
cellular component organization or biogenesis54
cellular component organization49
cell morphogenesis11
cellular component morphogenesis11
anatomical structure morphogenesis11
cellular developmental process11
single-organism developmental process11
Molecular functionstructural molecule activity418190164
structural constituent of ribosome231137117
ion binding569
purine ribonucleoside triphosphate binding310
ATP binding296
cytoskeletal protein binding59
tubulin binding43
microtubule binding42
motor activity41
microtubule motor activity33
xyloglucan:xyloglucosyl transferase activity12

DAF: days after flowering.

DAF: days after flowering. Pathway-based analysis is thought to provide a basic platform for the systematic analysis of DEGs involved in metabolic or signal transduction pathways. In this study, KEGG analyses were used to analyze gene function in terms of networks of gene products. Two types of DEGs, those up and downregulated at different development stages, were classified by KEGG, respectively (Table 6). In general, KEGG analysis assigned the DEGs (P < 0.05) of cgy-2NIL and DN47 to 16, 3, 9, 4, and 12 metabolic pathways (each of which contained 4–175 DEGs) at 18, 25, 35, 50, and 55 DAF, respectively (Table 6). At 18 DAF, we found several significant expression changes related to amino acid metabolism and fatty acid metabolism, including 13 genes involved in beta-alanine metabolism, six genes involved in histidine metabolism, 12 genes involved in arginine and proline metabolism, six genes involved in lysine degradation, and nine genes involved in fatty acid degradation, all of which were significantly upregulated. In addition, 41 genes involved in biosynthesis of amino acids showed significantly downregulated expression at 18 DAF. The majority of DEGs appeared to be related to ‘plant-pathogen interaction’ (32 genes, upregulated), ‘Ribosome biogenesis in eukaryotes’ (16 genes, downregulated), and ‘DNA replication’ (18 genes, downregulated) at 25 DAF. At 35 DAF, upregulated DEGs were assigned to ‘Ribosome’ and ‘photosynthesis’, while downregulated DEGs were assigned to seven KEGG pathways (protein processing in endoplasmic reticulum, ribosome biogenesis in eukaryotes, spliceosome, endocytosis, ABC transporters, RNA transport, and ubiquitin-mediated proteolysis). The DEGs identified at 55 DAF were assigned to 12 KEGG pathways. Pathways such as ribosome (145 genes, upregulated), biosynthesis of amino acids (114 genes, downregulated), and carbon metabolism (122 genes, downregulated) were highly represented.
Table 6

Kyoto Encyclopedia of Genes and Genomes (KEGG) assignment of differentially expressed genes (DEGs) identified in five developmental stages.

cgy-2NIL>DN47Number of DEGsCorrected P-value
pathway18 DAF25 DAF35 DAF50 DAF55 DAF18 DAF25 DAF35DAF50 DAF55 DAF
Circadian rhythm—plant19---261.51E-11--2.15E-02
beta-Alanine metabolism13----1.28E-06----
Fatty acid degradation9---254.49E-04---2.15E-02
Histidine metabolism6----5.34E-04----
Arginine and proline metabolism12----7.32E-04----
Lysine degradation6----2.26E-03----
Plant-pathogen interaction-32----5.06E-09---
Ribosome--96-145--4.64E-24-2.07E-09
Photosynthesis--415---3.07E-192.86E-02-
Protein processing in endoplasmic reticulum---2077---3.79E-101.89E-02
Photosynthesis—antenna proteins---4----3.46E-03-
Spliceosome----81----4.27E-06
Peroxisome----40----2.00E-03
RNA transport----62----2.74E-03
Sulfur relay system----9----2.15E-02
Valine, leucine and isoleucine----23----2.40E-02
degradation
Carotenoid biosynthesis----18----3.59E-02
cgy-2NIL<DN47
Ribosome175----8.27E-59----
Protein processing in endoplasmic reticulum59-35--7.90E-08-1.56E-02--
Ribosome biogenesis in eukaryotes271635--2.26E-044.96E-073.02E-10--
DNA replication1818-12-5.37E-034.85E-11-2.18E-07-
Carbon fixation in photosynthetic organisms20----7.56E-03----
Glycolysis / Gluconeogenesis28----3.02E-02----
Photosynthesis—antenna proteins8----3.45E-02----
Taurine and hypotaurine metabolism7----4.26E-02----
Plant-pathogen interaction34----4.57E-02----
Biosynthesis of amino acids41---1144.92E-02---3.09E-02
Spliceosome--55----1.02E-12--
Endocytosis--29----3.12E-03--
ABC transporters--10----8.22E-03--
RNA transport--28----8.22E-03--
Ubiquitin-mediated proteolysis--22----4.35E-02--
Carbon metabolism----122----1.18E-02

DAF: days after flowering.

DAF: days after flowering.

Transcription factors (TFs) affected by the ‘α-null’ mutation

TFs are important proteins that control the flow of genetic information from DNA to RNA, and ultimately affect the growth and physiology of the plant. In the present study, 74 TFs were differentially expressed between cgy-2-NIL and DN47, when a fold change ≥ 1 and P < 0.05 were used as cutoff values (Table 7). These genes were divided into different classes, as shown in Table 7. These TFs included BREVIS RADIX, GRAS, jumonji, GATA, SBP-box, and TCP. The most abundant TF group was GRAS. Among all the identified GRAS TFs, four (Glyma06G41500, Glyma07G39650, Glyma12G34420, and Glyma13G36120) were downregulated at 18 DAF in cgy-2-NIL; however, nine GRAS TFs were significantly upregulated at 25 DAF in cgy-2-NIL. Only one GRAS TF, Glyma12G34420, was identified as upregulated at 35 DAF. Eighteen GRAS TFs that were differentially expressed at 55 DAF displayed different expression patterns: 11 were downregulated in cgy-2-NIL, while seven were upregulated. The 55 DAF stage was characterized by the highest number of differentially expressed TFs in cgy-2-NIL compared with DN47, and there were more downregulated than upregulated TFs (39 vs. 20). Notably, 13 MADS-box TFs were all downregulated at 55 DAF. By contrast, the fewest number of TF genes was found at 50 DAF; only one upregulated TF, TCP (Glyma03G02090) and one downregulated TF, GATA (Glyma04G01090) were identified. In addition, we observed that in the 35 DAF whole seed, five groups of TFs were differentially expressed, including BREVIS RADIX (Glyma09G34601 and Glyma16G17590), GRAS (Glyma12G34420), jumonji (Glyma09G34040) and SBP-box (Glyma04G37391).
Table 7

Summary and annotation of transcription factors (TFs) selected using RPKM analysis of RNA-seq data.

Category of TFGene IDGene AnnotationRPKM
18 DAF25 DAF35 DAF50 DAF55 DAF
α-nullDN47α-nullDN47α-nullDN47α-nullDN47α-nullDN47
BREVIS RADIXGLYMA09G07930Transcription factor BREVIS RADIX1.690.68
GLYMA09G346018.6815.39
GLYMA16G175901.202.27
GRASGLYMA01G18040Transcription factor GRAS0.983.89
GLYMA01G332700.611.54
GLYMA01G401802.021.03
GLYMA01G436202.021.03
GLYMA02G467303.631.34
GLYMA03G037602.365.89
GLYMA04G420900.812.61
GLYMA05G0349017.297.03
GLYMA06G116101.870.73
GLYMA06G415000.591.613.371.28
GLYMA07G396509.4025.9130.0613.50
GLYMA08G437800.754.82
GLYMA09G014408.482.97
GLYMA09G041100.121.41
GLYMA10G376408.874.01
GLYMA11G018500.321.00
GLYMA11G147100.451.56
GLYMA11G174901.373.51
GLYMA12G020604.051.82
GLYMA12G066700.481.29
GLYMA12G167502.800.71
GLYMA12G344204.0212.1318.785.2897.6424.52
GLYMA13G09220
GLYMA13G3612011.1526.5646.0020.59
GLYMA15G123203.051.35
GLYMA17G1403020.788.61
GLYMA18G452201.094.16
GLYMA20G3015021.617.71
IIcGLYMA14G24776Transcription factor IIIC, 90kDa subunit, N-terminal6.272.53
GLYMA19G40560Transcription factor, WRKY group IIc0.972.62
jumonji, JmjNGLYMA10G35350Transcription factor jumonji, JmjN7.924.35
GLYMA09G34040Transcription factor jumonji, JmjN2.644.91
MYC/MYBGLYMA06G04550Transcription factor MYC/MYB N-terminal0.212.21
GLYMA17G315374.001.87
TFIIEGLYMA05G38060Transcription factor TFIIE beta subunit,22.6611.71
GLYMA05G07910Transcription factor TFIIE, alpha subunit8.063.81
GATAGLYMA04G01090Transcription factor, GATA, plant16.267.060.331.86
GLYMA10G354700.842.57
GLYMA16G271710.745.79
K-box MADS-boxGLYMA01G08130Transcription factor, K-box1.643.87
GLYMA02G134010.761.87
GLYMA04G436407.4020.22
GLYMA05G072860.121.98
GLYMA06G482708.2424.63
GLYMA08G4230015.2039.28
GLYMA11G161050.162.52
GLYMA11G368900.732.89
GLYMA13G067305.3017.27
GLYMA13G295100.812.07
GLYMA14G031002.015.78
GLYMA18G125901.365.20
GLYMA19G0432016.7253.88
NFYB/HAP3CBF/NF-YGLYMA09G01650Transcription factor, NFYB/HAP3Transcription factor, CBF/NF-Y0.622.99
GLYMA10G0560620.689.31
GLYMA17G009500.614.26
SBP-boxGLYMA02G13371Transcription factor, SBP-box0.361.07
GLYMA03G271950.361.07
GLYMA04G373910.251.48
TCPGLYMA03G02090Transcription factor, TCP1.230.240.060.40
GLYMA05G003001.744.07
GLYMA05G011312.810.79
GLYMA10G065155.302.02
GLYMA12G289700.462.65
GLYMA12G357202.405.57
GLYMA18G503713.551.60
DELLA GRASGLYMA11G33720Transcription factor DELLA, N-terminal4.3417.33
GLYMA18G045004.4514.14
OthersGLYMA11G14450Transcription factor IIA, alpha/beta subunit, N-terminal46.3823.63
GLYMA03G28000Transcription factor IIS, N-terminal14.766.34
GLYMA19G31830Transcription factor TFIIB, conserved site3.551.00
GLYMA13G07720Transcription factor, MADS-box0.261.02

RPKM: reads per kilobase of transcript per million reads mapped; DAF: days after flowering.

RPKM: reads per kilobase of transcript per million reads mapped; DAF: days after flowering.

Gene models annotated as Cupin proteins

Previous studies have characterized the cupins as important allergens in peanuts and soybeans [38-40]. The majority of cupin allergens belong to either the 11S legumin-like or the 7S vicilin-like seed storage globulin families. To better characterize the effect of α-null mutations on the differential expression of allergen genes, particular attention was paid to the cupin protein family in cgy-2-NIL. In the present study, 18 genes in Table 8 are annotated as encoding cupin proteins. In general, these genes showed peak expression (in RPKM) at 35 or 50 DAF, with RPKMs ranging from 0 to 52124.19. Most of these cupin genes were downregulated in cgy-2-NIL compared with DN47 throughout the five development stages.
Table 8

Summary of differentially expressed genes (DEGs) annotated as Cupin proteins.

CUPIN DEGsRPKM
18 DAFLog2 fold change25 DAFLog2 fold change35 DAFLog2 fold change50 DAFLog2 fold change55 DAFLog2 fold change
GENE IDHomologscgy-2NILDN47cgy-2NILDN47cgy-2NILDN47cgy-2NILDN47cgy-2NILDN47
GLYMA10G391507S(αˊ-subunit)1.15971.7119-0.60443398.46107830.8560-1.311218998.740018363.3300-0.098212163.320012909.6100-0.18351283.19204885.0200-1.7435
GLYMA20G286507S(α-subunit)0.00000.0075#2.033351.8679-4.77086.161990.2581-4.05334.600637.4387-3.11520.12909.2461-5.9973
GLYMA20G286607S(α-subunit)0.00000.0000#0.026515.7801-9.32000.139291.3088-9.49450.082535.4523-8.84580.00001.2569#
GLYMA20G284607S(β-subunit)0.00000.0000#0.09040.5778-2.8160291.5820160.93760.67531082.1720718.66990.5156113.0759508.6404-1.9759
GLYMA20G286407S(β-subunit)0.07860.07210.07250.22413.2988-3.9921500.7281371.26320.25951258.49401091.46600.1226158.1022606.8583-1.7488
GLYMA10G033907S1.13801.6587-0.58591095.89002584.6960-1.34445619.98606335.0460-0.32044041.84204589.9470-0.2679601.91223318.1880-2.2761
GLYMA02G164407S0.78380.60710.326573.8799260.0706-1.92911747.42701922.5170-0.27921811.17701356.95700.3192645.7540586.55640.3266
GLYMA10G391707S0.36240.3954-0.170847.1786122.0998-1.4848637.6302734.9372-0.35101636.77601094.22500.47431568.45201781.49700.0035
GLYMA03G3203011S(Gy1)0.00900.0887-3.32691519.84806668.6300-2.249026382.620035866.6200-0.581346783.760052124.1900-0.24041808.541016498.6800-2.9965
GLYMA03G3202011S(Gy2)0.01400.0342-1.3358734.06153416.8350-2.334416533.940021521.1000-0.522333946.040036257.4200-0.17941675.717011668.7100-2.6104
GLYMA19G3478011S(Gy3)0.01850.0341-0.909220.397938.4533-1.04544697.74804131.73500.06066783.23506454.2300-0.039726.6057260.1870-3.1019
GLYMA10G0428011S(Gy4)0.00000.0506#188.94221070.9180-2.627714273.150018260.9700-0.487618121.250021300.1900-0.3150429.27194996.3280-3.3479
GLYMA13G1845011S(Gy5)0.00000.0253#93.8437526.5713-2.615512032.550014774.3600-0.430320977.460022500.2000-0.1946412.35184566.2670-3.2790
GLYMA19G3477011s(Gy7)0.29840.18880.62301.83431.52770.17466.62075.00270.236813.164566.5259970.937520.8093617.720770.4226
GLYMA08G1344011S1.50471.5675-0.10206.75387.3609-0.223533.461743.2815-0.53049.662211.0915-0.26350.66322.7409-0.7581
GLYMA15G0471011S120.6579102.06190.198575.739275.3835-0.097362.850643.75490.359421.576719.71370.04683.97807.8482-0.4931
GLYMA16G009800.45810.26410.74961.61471.32150.19215.82742.74900.92210.05880.1320-1.22240.00000.0602#
GLYMA10G391610.09710.0990-0.07460.16820.3765-1.22152.46290.57721.93520.77140.0000#0.08490.04950.9501

Shadowing indicates a significant change in gene expression between ‘cgy-2 NIL’ and ‘DN47’. RPKM: reads per kilobase of transcript per million reads mapped; DAF: days after flowering. #: One of the data is zero, cannot use multiple expression.

Shadowing indicates a significant change in gene expression between ‘cgy-2 NIL’ and ‘DN47’. RPKM: reads per kilobase of transcript per million reads mapped; DAF: days after flowering. #: One of the data is zero, cannot use multiple expression. Among the 18 cupin genes, five belong to the β-conglycinin subunit gene family, including Glyma10g39150 encoding the α′-subunit, whereas Glyma20g28460 and Glyma20g28640 encode the β-subunit, and Glyma20g28650 and Glyma20g28660 encode the α-subunit [41] (http://www.Phytozome.net/soybean) (Table 8). The expression of α′-subunit gene (Glyma10g39150) was detected at 18 DAF, which is earlier than both the α-and β-subunit genes and peaked at 35 DAF. Its RPKM level was much higher than both the α- and β-subunit genes during the five developmental stages. Glyma10g39150 (α′-gene) showed downregulated expression throughout the five development stages in cgy-2-NIL (Table 8). Notably, the α-null mutation was associated with significantly reduced expression of both α-subunit genes, Glyma20G28650 and Glyma20G28660, in proportional amounts. The expression level of Glyma20g28650 was consistently higher than Glyma20G28660 from 25 to 55 DAF in cgy-2-NIL. The two genes showed almost no expression at 18 DAF, and began to be highly expressed at 25 DAF, showing peak expression at 35 DAF, which then declined until 55 DAF in DN47. Similar expression patterns of Glyma20G28650 and Glyma20G28660 were found in cgy-2-NIL; however, the level of expression of Glyma20G28650 was much lower in cgy-2-NIL than in DN47 at the same stage, while Glyma20G28660 was barely expressed throughout the five developmental stages in cgy-2-NIL. The two β-subunit genes of β-conglycinin, Glyma20g28460 and Glyma20g28640, also showed different expression levels between cgy-2-NIL and DN47. The expression levels of Glyma20g28460 and Glyma20g28640 in cgy-2-NIL were lower than those in DN47 at 25 DAF (by 2.8160-fold and 3.9921-fold, respectively), and at 55DAF (by 1.9759- and 1.7488-fold, respectively); However, in the other stages (35 and 50 DAF), the two β-subunit genes in cgy-2-NIL showed higher expression (Log2 fold change from 0.1226 to 0.6753) than in DN47. In addition, another six differentially expressed gene IDs matched glycinin subunit genes Gy1-7 (Glyma03g32030 to Gy1, Glyma03g32020 to Gy2; Glyma19g34780 to Gy3; Glyma10g04280 to Gy4; Glyma13g18450 to Gy5; Glyma19g34770 to Gy7) [42-44]. Among these six genes, the expressions of Gy1, Gy2, Gy4, and Gy5 in cgy-2-NIL were all lower than that in DN47 throughout five developing stages, and these genes showed a similar pattern of expression, i.e., starting at about 18–25 DAF, showing a peak in RPKM at 50 DAF, and then declining rapidly thereafter (Table 8). The expression level of Gy3 in cgy-2-NIL was lower than that in DN47 at 18, 25, and 55 DAF, but higher at 35 and 50 DAF. Gy7 expression was drastically lower than the other five glycinin genes, both in cgy-2-NIL and DN47 from 25 DAF to 55 DAF. Furthermore, cgy-2-NIL had higher expression levels of Gy7 than that in DN47 throughout the five stages examined in the present study.

qRT-PCR validation of differential gene expression in cgy-2-NIL and DN47

We used qRT-PCR to validate selected DEGs identified from the RNA-seq data. Six DEGs (GLYMA06G02690, GLYMA07G39650, GLYMA11G33720, GLYMA12G34420, GLYMA18G04500 and GLYMA20G28530) that were differentially expressed in all five stages were selected, which included up- and downregulated genes between cgy-2-NIL and DN47. The relative expression changes of the selected genes are shown in Fig 3: a positive correlation (R2 = 0.6069) between the RNA-seq data and qRT-PCR data was detected (Fig 3B). All six selected genes showed consistent up or downregulated expression patterns throughout all five detected stages, respectively, confirming the RNA-seq data (Fig 3C).

Discussion

The α-subunit is one of the major components of soybean seed storage proteins; therefore, the complete deficiency of the α-subunit should change the gene’ expression profiles and metabolic pathways during seed maturation. NILs are valuable genetic resources to identify genomic regions and alleles responsible for trait variation [45], and are also particularly suitable for genetic analyses of transcriptome and proteome variations. To further understand the potential mechanisms involved in the regulation of the α-null mutation, we have used RNA-seq to investigate global gene expression changes over five stages of soybean cotyledon development in seeds of α-subunit-deficient NIL lines (cgy-2-NIL). We have identified several critical genes that were possibly associated with the α-null mutation. Only Glyma20G28660 was annotated as the α-subunit gene of β-conglycinin, and appeared to be significantly downregulated throughout the three green stages (25, 35, and 50 DAF) of development studied here. Surprisingly, at 55 DAF (the desiccating stage of development), the number of DEGs was the highest. This observation is consistent with the results of a previous report [26], in which many genes showed peak expression at the latter stages of seed maturation. These genes were annotated as being TFs or related to protein degradation [26]. Our analyses have also resulted in the identification of interesting late expressed DEGs (at 55 DAF). In particular, Glyma16G03600, which is involved in cysteine and methionine metabolism, was upregulated by 7.56-fold in ‘cgy-2-NIL’, and Glyma04G02660 and Glyma06G02690, which were annotated as gibberellin-regulated protein genes. We also predicted many novel candidate genes that were associated with the α-null mutation, which provide a strong basis for future research on determining the molecular mechanism of α-subunit-null deficiency. To determine whether the differential expression of genes such as Glyma16G03600, Glyma04G02660, and Glyma06G02690 have a direct relation to α-subunit-null mutation, the function of these DEGS will be studied by RNA interference or by overexpression in transgenic plants in the future. This could lead to a better understanding of the molecular regulation of storage protein subunit accumulation in the α-null mutant. The cupins are a large superfamily, named on the basis of a conserved ‘double-stranded β-helix’ barrel-like structure (‘cupa’ means ‘small barrel’ in Latin). The majority of cupin allergens were originally discovered using a conserved motif found within the 7S vicilin-like or 11S legumin-like seed storage globulin families from higher plants [46]. The cupin superfamily of proteins possesses remarkable functional diversity, with representatives found in the Archaea, Eubacteria, and Eukaryota [47-49]. Previous studies characterized the majority of cupin allergens as belonging to either the 11S legumin-like or 7S vicilin-like seed storage globulin families. In our study, 16 storage protein subunit genes, eight 7S-related subunits, and eight 11S-related subunits were included in the cupin group (Table 8). Soybean seeds contain between 35 and 45% protein on a dry weight basis, of which about 70% consists of the two major storage proteins, 7S globulin (β-conglycinin) and 11S (glycinin). Development changes in the synthesis of β-conglycinin and glycinin have been described previously [50,25,26]. In the present study, the expression of various subunit gens of β-conglycinin and glycinin in both cgy-2-NIL and DN47 showed similar developmental expression patterns: they presented a bell-shaped pattern of expression that started at 18–25 DAF, reached a maximum at 35 DAF or 50 DAF, and declined rapidly thereafter. The α′-and α-subunit genes of β-conglycinin reached their expression peaks (at 35 DAF) before the β-subunit genes of β-conglycinin and five glycinin, Gy1–Gy5 subunit genes (at 50 DAF). These results were similar to earlier observations [50, 25, 26]. However, the expression levels of 18 cupin genes in cgy-2-NIL were significantly different to those in DN47. The α-null mutation caused almost all the β-conglycinin (α′-, α-, and β-subunit) genes and glycinin (Gy1-, Gy2-, Gy3-, Gy4-, Gy5-, -subunits) genes to show downregulated expression in at least two stages of development studied here. The expressions of various β-conglycinin and glycinin subunit genes were regulated coordinately in the cgy-2-NIL, which might be responsible for the altered amino acid composition and improved protein quality. Previous analysis of β-conglycinin-deficient lines revealed that the loss of β-conglycinin was compensated for by an increase in the abundance of glycinin [1]. Glycinin, an 11S globulin, is the predominant seed storage protein in soybean, and makes an important contribution to the nutritional quality of soy protein. In the present study, compared with DN47, the α-null mutation caused glycinin Gy3 to be upregulated at 35 DAF and Gy7 was upregulated throughout all five stages. To date, five glycinin genes, Gy1–Gy5 have been described in detail. Gy4 and Gy5 encode proteins that have lower concentration of sulfur amino acids than the proteins derived from Gy1, Gy2, and Gy3 [51]. Furthermore, Belinson et al. [44] identified and mapped a new functional glycinin gene, Gy7, which encodes the sixth glycinin subunit Gy7. Their data revealed that the steady-state amount of mRNA encoding Gy7 at seed mid-maturation is an order of magnitude less than the mRNA encoding the five other glycinin subunits [44]. Similar results were obtained in our study, which further confirmed that the GY7 gene has a lower expression level than the five other glycinin subunits from 25 to 55 DAF, both in cgy-2-NIL and DN47. To date, little is known about the effect of the Gy7 subunit on protein nutritional quality, tofu-making quality, and its health benefits. Different from the other five glycinin genes, GY7 expression in cgy-2-NIL slightly exceeded that of DN47 throughout the five stages identified in the present study, and showed a unique developmental expression pattern in both cgy-2-NIL and DN47, i.e., increased from the 18 DAF until reaching a peak at 55 DAF. The upregulated expressions of Gy3 and Gy7 might, at least in part, contribute to the modified final seed protein content in cgy-2-NIL.

Conclusions

We present an overview of genes whose expression was affected by the ‘α-null’ mutation in soybeans. A number of soybean genes with annotations related to cupin allergen proteins, transcription factors, and other processes were differentially expressed in cgy-2-NIL. Some of these genes may be candidates for hypoallergenic soybean breeding. The cgy-2 allele in the homozygous form modified the expression level of various β-conglycinin and glycinin cupin-family-genes. The desiccating stage of development (55DAF), is a critical period of differential gene expression. Our findings will help provide a detailed understanding of the α-subunit-null mechanism. In addition, the cgy-2 allele was validated as an effective and useful allele for soybean breeding programs that aim to modify protein quality and reduce allergenicity.
  29 in total

Review 1.  Cupins: the most functionally diverse protein superfamily?

Authors:  Jim M Dunwell; Alan Purvis; Sawsan Khuri
Journal:  Phytochemistry       Date:  2004-01       Impact factor: 4.072

2.  Expression of the stress-related genes for glutathione S-transferase and ascorbate peroxidase in the most-glycinin-deficient soybean cultivar Tousan205 during seed maturation.

Authors:  Tomoyo Narikawa; Tomoko Tamura; Kazuhiro Yagasaki; Kaede Terauchi; Kazutsuka Sanmiya; Yoshiro Ishimaru; Keiko Abe; Tomiko Asakura
Journal:  Biosci Biotechnol Biochem       Date:  2010-09-07       Impact factor: 2.043

3.  Microbial relatives of seed storage proteins: conservation of motifs in a functionally diverse superfamily of enzymes

Authors: 
Journal:  J Mol Evol       Date:  1998-02       Impact factor: 2.395

Review 4.  Cupins: a new superfamily of functionally diverse proteins that include germins and plant storage proteins.

Authors:  J M Dunwell
Journal:  Biotechnol Genet Eng Rev       Date:  1998

5.  Global gene expression profiles in developing soybean seeds.

Authors:  Tomiko Asakura; Tomoko Tamura; Kaede Terauchi; Tomoyo Narikawa; Kazuhiro Yagasaki; Yoshiro Ishimaru; Keiko Abe
Journal:  Plant Physiol Biochem       Date:  2011-12-30       Impact factor: 4.270

6.  Cleavage of structural proteins during the assembly of the head of bacteriophage T4.

Authors:  U K Laemmli
Journal:  Nature       Date:  1970-08-15       Impact factor: 49.962

Review 7.  Soybean allergens and hypoallergenic soybean products.

Authors:  A Ogawa; M Samoto; K Takahashi
Journal:  J Nutr Sci Vitaminol (Tokyo)       Date:  2000-12       Impact factor: 2.000

8.  Inheritance of alleles for Cgy1 and Gy 4 storage protein genes in soybean.

Authors:  K Kitamura; C S Davies; N C Nielsen
Journal:  Theor Appl Genet       Date:  1984-06       Impact factor: 5.699

9.  Molecular cloning and epitope analysis of the peanut allergen Ara h 3.

Authors:  P Rabjohn; E M Helm; J S Stanley; C M West; H A Sampson; A W Burks; G A Bannon
Journal:  J Clin Invest       Date:  1999-02       Impact factor: 14.808

10.  Manipulation of amino acid composition in soybean seeds by the combination of deregulated tryptophan biosynthesis and storage protein deficiency.

Authors:  Yoichi Kita; Yumi Nakamoto; Masakazu Takahashi; Keisuke Kitamura; Kyo Wakasa; Masao Ishimoto
Journal:  Plant Cell Rep       Date:  2009-11-27       Impact factor: 4.570

View more
  2 in total

1.  Genome-wide analysis of sucrose synthase family in soybean and their expression in response to abiotic stress and seed development.

Authors:  Muhammad Zulfiqar Ahmad; Hafiz Ishfaq Ahmad; Asma Gul; Zamarud Shah; Bushra Ahmad; Shakeel Ahmed; Abdullah Ahmed Al-Ghamdi; Mohamed S Elshikh; Arshad Jamil; Jamal Abdul Nasir; Helena Dvořáčková; Jan Dvořáček
Journal:  PLoS One       Date:  2022-02-25       Impact factor: 3.752

Review 2.  Soybean genetic resources contributing to sustainable protein production.

Authors:  Bingfu Guo; Liping Sun; Honglei Ren; Rujian Sun; Zhongyan Wei; Huilong Hong; Xiaoyan Luan; Xiaobo Wang; Donghe Xu; Wenbin Li; Li-Juan Qiu
Journal:  Theor Appl Genet       Date:  2022-10-14       Impact factor: 5.574

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.