Literature DB >> 25360675

A comprehensive analysis of the Cupin gene family in soybean (Glycine max).

Xiaobo Wang1, Haowei Zhang1, Yali Gao1, Genlou Sun2, Wenming Zhang1, Lijuan Qiu3.   

Abstract

Cupin superfamily of proteins, including germin and germin-like proteins (GLPs) from higher plants, is known to play crucial roles in plant development and defense. To date, no systematic analysis has been conducted in soybean (Glycine max) incorporating genome organization, gene structure, expression compendium. In this study, 69 putative Cupin genes were identified from the whole-genome of soybean, which were non-randomly distributed on 17 of the 20 chromosomes. These Gmcupin proteins were phylogenetically clustered into ten distinct subgroups among which the gene structures were highly conserved. Eighteen pairs (52.2%) of duplicate paralogous genes were preferentially retained in duplicated regions of the soybean genome. The distributions of GmCupin genes implied that long segmental duplications contributed significantly to the expansion of the GmCupin gene family. According to the RNA-seq data analysis, most of the Gmcupins were differentially expressed in tissue-specific expression pattern and the expression of some duplicate genes were partially redundant while others showed functional diversity, suggesting the Gmcupins have been retained by substantial subfunctionalization during soybean evolutionary processes. Selective analysis based on single nucleotide polymorphisms (SNPs) in cultivated and wild soybeans revealed sixteen Gmcupins had selected site(s), with all SNPs in Gmcupin10.3 and Gmcupin07.2 genes were selected sites, which implied these genes may have undergone strong selection effects during soybean domestication. Taken together, our results contribute to the functional characterization of Gmcupin genes in soybean.

Entities:  

Mesh:

Year:  2014        PMID: 25360675      PMCID: PMC4215997          DOI: 10.1371/journal.pone.0110092

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

The cupin superfamily of proteins, mainly consisted of germin and germin-like protein (GLP) subfamilies, is extremely diverse in plants and possess various enzymatic activities such as sugar-binding metal-independent epimerases, and metal-dependent enzymes possessing dioxygenase, and decarboxylase [1], [2]. Germin, initially identified as a specific marker for germination in wheat embryos [3], has been characterized as a homopentameric glycoprotein with oxalate oxidase (OxO) activity [4]. To date, it is speculate to play significant roles in plant development and defense through oxidative breakdown of oxalate, leading to generation of H2O2 [5], [6]. Germin like proteins (GLPs), with a high sequence and structural similarity to cereal germins, differ from germin as they mostly lack oxalate oxidase activity, and possess activity of SOD and phosphodiesterase [2], [7]–[9]. Cupin-domain has been reported to be associated with the biological properties in plants. For instance, a group of single cupin-domain related proteins, including two phosphomannose isomerases and two epimerases involved in cell wall synthesis, were identified in Synechocystis PCC6803 genome [10]. Moreover, a duplicated, two cupin-domain GLP protein showed close similarity in structure of an oxalate decarboxylase from the fungus Collybia velutipes and is considered as a putative progenitor of the storage proteins of land plants [10]. Until now, a total of 27 GLP genes have been identified in Arabidopsis, and their expression vary in different tissues such as roots, leaves and flowers [11]–[13]. Lapik et al reported a cupin-domain protein AtPirin1 could interact with a CCAAT box binding transcription factor, and served as a downstream component of GPA1 in regulating seed germination and early seedling development [14]. Recently, another two GLP proteins (PDGLP1 and PDGLP2) in Arabidopsis, which could interact with Cucurbita maxima PHLOEM PROTEIN 16 (Cm-PP16), involved in the regulation of growth of primary root through modulating phloem-mediated resource allocation between the primary and lateral root meristems [15]. The PDGLP1 signal peptide was shown to function in localization to the plasmodesmata (PD) by a novel mechanism involving the endoplasmic reticulum-Golgi secretory pathway. Further, in plum (Prunus salicina), two GLP-encoding genes (designated as Ps-GLP1 and Ps-GLP2) were cloned, and the regulation was studied throughout fruit development and during maturity of early and late cultivars. All these demonstrated that GLPs may involve in certain developmental stages in plants. Expression of Cupin proteins could be modulated by abiotic or biotic stresses, suggesting their multifunctional roles in plant defense response. For instance, a 66-kDa cupin protein BspA (for “boiling-stable protein”), highly expressed in cultured shoots of aspen (Populus tremula) in the presence of water stress, was considered to contribute to membrane stability [16]. However, the mechanism of how cupin proteins involve in the plant defense is still not well defined. One germin-like gene (CchGLP) cloned from geminivirus-resistant pepper, induced by ethylene and salicylic acid other than jasmonic acid, encoded an enzyme with manganese superoxide dismutase (Mn-SOD)activity [17]. Also, Mn-SOD activity was identified in GLPs isolated from tobacco and Barbula unguiculata [18]–[20]. Considering plant Mn-SODs was distributed extracellularly as well as in mitochondria and peroxisomes and associated with defense against biotic stress in plants [18], [21], [22], it is probably to speculate that Cupin protein may involve in the plant defense through scavenging free radicals. The ubiquitous distribution of GLPs implies their indispensable and fundamental roles in plants [23], [24]. In soybean, rare studies have been performed on the functional characterization of Cupin proteins [25]. Completion of the soybean genome greatly facilitated the identification of gene families at the whole-genome level [26]. In the present study, a genome-wide identification of Cupin domain was performed in soybean, and detailed analysis of the sequence phylogeny, genome organization, gene structure, expression profiling and selective effects of Gmcupin genes during soybean domestication was performed. Our data contributes to the evolutionary and functional analysis of the Cupin gene family in soybean.

Materials and Methods

Sequence retrieval and phylogenetic analysis

Amino-acid sequence of the Cupin domain was used to search for potential Dof-domain homolog hits in the whole-genome sequence of Glycine max with BLASTP at the Phytozome database (http:/www.phytozome.net) [27]. All non-redundant hits with expected values of <1E-5 were collected. Subsequently, manual analysis was performed to confirm the presence of Cupin domain using InterProScan program (http://www.ebi.ac.uk/Tools/InterProScan/) [28]. Sequence alignments of the full-length protein sequences were performed using Clustal X software (version 1.8) [29]. pan class="Chemical">Phylogenetic trees were constructed with MEGA 5.0 using Neighbor-Joining (NJ) method with 1000 replicates of bootstrap analysis [30]. The evolutionary distances were computed using the p-distance method. WebLogo was used to create the distribution of amino-acid residues at the corresponding positions in domain profiles for the conserved Cupin domain of pan class="Chemical">Gmcupins [28].

Identification of conserved motifs

For the motif analysis, deduced amino-acid sequences of the pan class="Chemical">Gmcupins were analysis by Multipn>le EM for Motif Elicitation version 4.9.1 (httpn>://meme.nbcr.net/meme/cgi-bin/meme.cgi) [31]. Structural motif annotation was performed using the pan class="Chemical">Pfam (http://pfam.sanger.ac.uk), NCBI (http://www.ncbi.nlm.nih.gov/) and SMART (http://smart.embl-heidelberg.de) database.

Genomic structure and chromosomal location of Gmcupins

The exon/intron organization for individual cupin gene was illustrated with Gene structure display server program (GSDS) (http://gsds.cbi.pku.edu.cn/) [32] by comparing the cDNAs with their corresponding gDNA sequences in the Phytozome database (http://www.phytozome.net/gmax). The chromosomal locations of soybean cupins were mapped to the duplicated blocks using the Chromosome Visualization Tool (CViT) genome search and synteny viewer at the Legume Information System (http://comparative-legumes.org/) [33], [34].

Calculation of Ka/Ks values

Clustal X (version 1.8) was used for the pairwise alignments of the paralogous nucleotide sequences [29]. Ka (non-synonymous substitution rate) and Ks (synonymous substitution rate) were estimated using the DnaSp v5 program [35]. Divergence time (T) was calculated using as the formula: T = Ks/2λ, where the synonymous mutation rate λ was 6.1×10−9 for soybean [26], [36], [37].

Expression analysis of GmCupin genes

Genome-wide transcriptome data of seeds during various developmental stages were downloaded from Soybase database (http://soybase.org/). The transcript data were obtained from vegetative tissues (e.g. young leaf, root and nodule), seed of seven developmental stages (10, 14, 21, 25, 28, 35, and 42 days after flowering), and reproductive tissues (e.g. flower, one cm pod, pod shell of 10 and 14 days after flowering). All transcript data were analyzed with Cluster 3.0 [38] and the heat map was viewed in Java Treeview [39].

Evolutionary analysis of Gmcupin genes

Single nucleotide polymorphisms (SNPs) of the Gmcupin genes were downloaded from the SoyKB database (httpn>://soykb.org/) based on the resequencing of wild and cultivated soybean genomes [40]. The ratio of each SNP in wild and cultivated soybean populations was analyzed respectively. The SNP site with reverse distribution ratio in different types of soybean population was defined as a putative selective site throughout domestication.

Results and Discussion

Identification of Cupin gene family in soybean

In order to identify the Cupin gene family in soybean genome, BLASTP was performed against the G. max v1.1 genome using the conserved Cupin domain. Afterwards, the obtained sequences were used as secondary queries. A total of 69 non-redundant Cupin genes were identified in the soybean genome (Table 1). To identify the conserved Cupin domain, all candidates were subjected to functional analysis using InterproScan program (http://www.ebi.ac.uk/Tools/InterProScan/). Soybean Cupin genes were numbered from Gmcupin01.1 to Gmcupin 20.4 according to their localization on chromosomes. Peptides consisted of 125–495 (average 224) amino acids were encoded by the identified Gmcupin genes in soybean.
Table 1

Summary of Cupin family members in soybean.

Gene SymbolGene LocusPrimary transcriptGene locationAmino AcidsExtrons
Gmcupin01.1Glyma01g04450Glyma01g04450.1Gm01:3990477-39919862201
Gmcupin02.1Glyma02g01085Glyma02g01085.1Gm02:816290-8174821472
Gmcupin02.2Glyma02g03100Glyma02g03100.1Gm02:2414696-24158802201
Gmcupin02.3Glyma02g05010Glyma02g05010.1Gm02:4077995-40786122051
Gmcupin03.1Glyma03g32030Glyma03g32030.1 Gm03: 39840052 - 398427634954
Gmcupin03.2Glyma03g38630Glyma03g38630.1Gm03:44934345-449337302182
Gmcupin04.1Glyma04g39040Glyma04g39040.2Gm04:45306333-453072311993
Gmcupin05.1Glyma05g25620Glyma05g25620.1Gm05:31685194-316861232152
Gmcupin06.1Glyma06g15930Glyma06g15930.1Gm06:12516557-125176882281
Gmcupin07.1Glyma07g04310Glyma07g04310.1Gm07:3163391-31645342091
Gmcupin07.2Glyma07g04320Glyma07g04320.1Gm07:3167203-31680782081
Gmcupin07.3Glyma07g04330Glyma07g04330.1Gm07:3173276-31743942081
Gmcupin07.4Glyma07g04340Glyma07g04340.1Gm07:3179749-31808332251
Gmcupin07.5Glyma07g04400Glyma07g04400.1Gm07:3202414-32035322081
Gmcupin08.1Glyma08g08600Glyma08g08600.1Gm08:6134739-61344642152
Gmcupin08.2Glyma08g24320Glyma08g24320.1Gm08:18508831-185098422111
Gmcupin09.1Glyma09g03010Glyma09g03010.1Gm09:2110529-21098472172
Gmcupin09.2Glyma09g08030Glyma09g08030.1Gm09:7066672-70666671351
Gmcupin10.1Glyma10g08360Glyma10g08360.1Gm10:7201264-72008712262
Gmcupin10.2Glyma10g11935Glyma10g11935.1Gm10:12509357-125097341251
Gmcupin10.3Glyma10g28010Glyma10g28010.1Gm10:36807794-368076862212
Gmcupin10.4Glyma10g28020Glyma10g28020.1Gm10:36812065-368116962202
Gmcupin10.5Glyma10g28190Glyma10g28190.1Gm10:36981942-369807912232
Gmcupin10.6Glyma10g31200Glyma10g31200.2Gm10:39762112-397615241983
Gmcupin10.7Glyma10g31210Glyma10g31210.1Gm10:39768393-397680982322
Gmcupin10.8Glyma10g42611Glyma10g42611.1Gm10:49519941-495205741773
Gmcupin12.1Glyma12g09630Glyma12g09630.2Gm12:7391558-73921812071
Gmcupin12.2Glyma12g09640Glyma12g09640.2Gm12:7398088-73989572122
Gmcupin12.3Glyma12g09760Glyma12g09760.2Gm12:7531728-75323512071
Gmcupin12.4Glyma12g31110Glyma12g31110.1Gm12:34711894-347125172071
Gmcupin13.1Glyma13g16960Glyma13g16960.2Gm13:20815056-208163991992
Gmcupin13.2Glyma13g18450Glyma13g18450.2 Gm13: 22109247 - 221132542264
Gmcupin13.3Glyma13g22050Glyma13g22050.1Gm13:25624544-256243972392
Gmcupin13.4Glyma13g40360Glyma13g40360.1Gm13:40856942-408591174835
Gmcupin15.1Glyma15g05040Glyma15g05040.2 Gm15: 3611464 - 36137573517
Gmcupin15.2Glyma15g13960Glyma15g13960.1Gm15:10534152-105335142152
Gmcupin15.3Glyma15g19510Glyma15g19510.1Gm15:16833879-168352432131
Gmcupin15.4Glyma15g35130Glyma15g35130.1Gm15:39672342-396734192311
Gmcupin16.1Glyma16g00980Glyma16g00980.1Gm16:652712-6539682091
Gmcupin16.2Glyma16g00990Glyma16g00990.1Gm16:656546-6571711812
Gmcupin16.3Glyma16g01000Glyma16g01000.1Gm16:660570-6611902061
Gmcupin16.4Glyma16g06500Glyma16g06500.1Gm16:5853241-58531222212
Gmcupin16.5Glyma16g06520Glyma16g06520.1Gm16:5858235-58581042212
Gmcupin16.6Glyma16g06530Glyma16g06530.1Gm16:5860996-58621042202
Gmcupin16.7Glyma16g06630Glyma16g06630.1Gm16:5947078-59469612212
Gmcupin16.8Glyma16g06640Glyma16g06640.1Gm16:5951136-59523632152
Gmcupin16.9Glyma16g07550Glyma16g07550.1Gm16:6838751-68393832101
Gmcupin16.10Glyma16g07560Glyma16g07560.2Gm16:6844140-68437431881
Gmcupin16.11Glyma16g07580Glyma16g07580.1Gm16:6860067-68608402141
Gmcupin17.1Glyma17g05760Glyma17g05760.1Gm17:4052453-40535772081
Gmcupin19.1Glyma19g09370Glyma19g09370.2Gm19:11189789-111894861813
Gmcupin19.2Glyma19g09810Glyma19g09810.1Gm19:11530120-115311832212
Gmcupin19.3Glyma19g09830Glyma19g09830.1Gm19:11554699-115557092212
Gmcupin19.4Glyma19g09840Glyma19g09840.1Gm19:11601504-116022812212
Gmcupin19.5Glyma19g09860Glyma19g09860.1Gm19:11630024-116311472212
Gmcupin19.6Glyma19g09990Glyma19g09990.1Gm19:11922733-119226202212
Gmcupin19.7Glyma19g24840Glyma19g24840.1Gm19:30509520-305104722122
Gmcupin19.8Glyma19g24850Glyma19g24850.1Gm19:30514099-305139552212
Gmcupin19.9Glyma19g24870Glyma19g24870.2Gm19:30532519-305336172212
Gmcupin19.10Glyma19g24900Glyma19g24900.1Gm19:30559228-305600362212
Gmcupin19.11Glyma19g24910Glyma19g24910.1Gm19:30576453-305773832192
Gmcupin19.12Glyma19g27580Glyma19g27580.1Gm19:34882234-348846372122
Gmcupin19.13Glyma19g34780Glyma19g34780.1 Gm19: 42366324 - 423692904814
Gmcupin19.14Glyma19g41070Glyma19g41070.2Gm19:47390045 - 473911311843
Gmcupin19.15Glyma19g41220Glyma19g41220.1Gm19:47524955-475243822192
Gmcupin20.1Glyma20g22180Glyma20g22180.1Gm20:32106832-321059352242
Gmcupin20.2Glyma20g25430Glyma20g25430.1Gm20:35111115-351117382071
Gmcupin20.3Glyma20g36300Glyma20g36300.1Gm20:44453211-444542642322
Gmcupin20.4Glyma20g36320Glyma20g36320.1Gm20:44458621-444597182222
Multiple alignment analysis was performed to discover the features of the homologous domain sequence and the frequency of the amino-acids at each position of the Gmcupin domains. Multiple EM for Motif Elicitation was used to identify the putative cupin motif. Two conserved domains, designated as Gmcupin 1 and Gmcupin 2, were found in these Gmcupins, and were formed by 59 amino acids and 52 amino acids, respectively. In Gmcupin 1, seven highly conserved residues were identified, including H-34, H-36, P-37, E-41, Gly-48, Gly-53 and F-54. In Gmcupin 2, four conserved residues were identified such as Gly-8, P-14, H-19 and N-23 (Figure 1). In the previous reports, the histidines and glutamic acid(s) have been reported to act as ligands for the active-site metal [18], [19], [41]. Additionally, studies showed that a set of conserved histidine residues employed in sugar-binding in the ancestral non-enzymatic domain evolved into the metal-coordinating histidine residues in oxalate oxidase [42] and oxalate decarboxylase [43].
Figure 1

Conserved domains across cupin proteins in soybean.

The sequence logos are based on alignments of 69 Gmcupin domains. Multiple alignment analysis of all typical Gmcupin domains (A: Gmcuppin 1; B: Gmcupin 2) were performed with Clustal W. The bit score indicates the information content for each position in the sequence.

Conserved domains across cupin proteins in soybean.

The sequence logos are based on alignments of 69 Gmcupin domains. Multiple alignment analysis of all typical Gmcupin domains (A: Gmcuppin 1; B: Gmcupin 2) were performed with Clustal W. The bit score indicates the information content for each position in the sequence.

Phylogenetic relationships and gene structure of soybean Cupin genes

The abundance of Gmcupin genes may derive from multiple gene duplication events, which was represented by a whole-genome duplication following multiple segmental and tandem duplications [44]. In this study, an unrooted tree was constructed to examine the phylogenetic relationships among the Cupin domains using alignments of the full-length amino-acid sequences in all Gmcupin proteins (Figure 2). The Gmcupin gene family was classified into ten subgroups (I-X) with 2-22 members in each subgroup. The very high bootstrap value in each subgroup suggested a common origin for the Gmcupin gene in each group except subgroup I. Surprisingly, 12 Gmcupin genes (80%) on chromosome 19 were classified into subgroup I with five genes (Gmcupin19.2, Gmcupin19.3, Gmcupin19.4, Gmcupin19.5 and Gmcupin19.6) showed the same base composition. Phylogenetic tree topology revealed that 22 Gmcupin pairs located at the terminal nodes shared high similarities. Thus, they were assigned as paralogous pairs (homologous genes that diverged by gene duplication, Figure 2). These paralogous pairs of Gmcupin genes, accounted for more than 63% of the entire Gmcupin family, and showed a sequence similarity of 77.2%∼100% (Table S1). This implied that these genes may evolve from a recent soybean genome duplication event [45].
Figure 2

Phylogenetic relationships and gene structure of Gmcupin genes.

The phylogenetic tree of Gmcupin proteins constructed from a complete alignment of 69 Gmcupin proteins using MEGA 5.0 by the neighbor-joining method. The bootstrap consensus tree inferred from 1000 replicates is taken to represent the evolutionary history of the taxa analyzed. Percentage bootstrap scores of>50% are indicated on the nodes. Ten major phylogenetic subgroups (designated as I to X) are indicated. Exons of Gmcupin genes are represented by green boxes and introns and untranslated region (UTR) by black and blue lines. The sizes of exons and introns can be estimated using the scale below.

Phylogenetic relationships and gene structure of Gmcupin genes.

The phylogenetic tree of Gmcupin proteins constructed from a complete alignment of 69 Gmcupin proteins using MEGA 5.0 by the neighbor-joining method. The bootstrap consensus tree inferred from 1000 replicates is taken to represent the evolutionary history of the taxa analyzed. pan class="Chemical">Percentage bootstrap scores of>50% are indicated on the nodes. Ten major phylogenetic subgroups (designated as I to X) are indicated. Exons of Gmcupin genes are represented by green boxes and introns and untranslated region (UTR) by black and blue lines. The sizes of exons and introns can be estimated using the scale below. As gene structural diversity is a possible explanation to the evolution of multigene families, the exon/intron organization in the coding sequences of each Gmcupin gene was compared. According to their predicted structures, extremely similar gene structures were observed in most of the closely related Gmcupin members within the same. In addition, the position and length of intron were almost completely conserved (Fig. 2A). For instance, most of the Gmcupin genes in subgroup I, II and III contained one intron respectively, except Gmcupin 19.4, Gmcupin 10.2, Gmcupin 10.6 in subgroup I and Gmcupin 19.14 in subgroup III. Meanwhile, no introns were identified in 23 of the Gmcupin genes (23/30) in subgroup IV, V, VI, VII, VIII and IV, respectively. In contrast, the gene structures in Gmcupin subfamily X appeared to be more variable and displayed the largest number of exon/intron structure variants compared with the other Gmcupin genes. The dissimilarity of intron phases between subfamilies and the conservation within Gmcupin subfamilies may reciprocally support to the results of phylogenetic analysis and genome duplication.

Chromosomal location and duplication of soybean Cupin genes

As revealed in Figure 3, Gmcupin genes were non-randomly distributed on 17 of the 20 chromosomes. Fifteen Cupin genes were localized on chromosome 19, while eleveen genes were localized on chromosome 16. In contrast, no more than two Gmcupins genes were localized on eleven chromsomes. What's more, no Cupin genes were distributed on chromosome 11, 14 and 18, respectively. Most Gmcupins presented substantial clustering on several chromosomes especially on those with high densities of the genes. To be exact, 10 Gmcupin genes on chromosome 16 were arranged in four clusters, with each in less than 9-kb (Gmcupin16.1, Gmcupin16.2, and Gmcupin16.3 located within 8.5-kb; Gmcupin16.4, Gmcupin16.5 and Gmcupin16.6 located within 8.8-kb; Gmcupin16.7 and Gmcupin16.8 located within 5.3-kb; Gmcupin16.9 and Gmcupin16.10 located within 5-kb), the other Gmcupin gene Gmcupin16.11 is also close to its neighbor Gmcupin16.10 within a 1.7-kb segment. Similarly, Gmcupin19.7 and Gmcupin19.8 located within a 4.5-kb segment, while Gmcupin19.10 and Gmcupin19.11 located within a 19-kb segment on the same chromosome.
Figure 3

Chromosomal locations and predicted clusters for Gmcupin genes.

The schematic diagram of genome-wide chromosome organization and segmental duplication arising from the genome duplication event in soybean was derived from the CViT genome search and synteny viewer at the Legume Information System (http://comparative-legumes.org). The chromosomal positions of all Gmcupin genes were mapped on each chromosome. Colored blocks to the left of each chromosome show duplications with chromosomes of the same color. The chromosome numbers are indicated at the top of each bar and sizes of chromosomes are represented by the vertical scale. The locations of centromeric repeats are shown as black rectangles over the chromosomes.

Chromosomal locations and predicted clusters for Gmcupin genes.

The schematic diagram of genome-wide chromosome organization and segmental duplication arising from the genome duplication event in pan class="Species">soybean was derived from the CViT genome search and synteny viewer at the Legume Information System (http://comparative-legumes.org). The chromosomal positions of all Gmcupin genes were mapped on each chromosome. Colored blocpan class="Chemical">ks to the left of each chromosome show duplications with chromosomes of the same color. The chromosome numbers are indicated at the top of each bar and sizes of chromosomes are represented by the vertical scale. The locations of centromeric repeats are shown as black rectangles over the chromosomes. Soybean genome is speculated to undergo at least two rounds of genome-wide duplication followed by multiple segmental duplication, tandem duplication, transposition events (e.g. retroposition and replicative transposition) [45]. A tandem duplication event was confirmed by the presence of two or more genes on the same chromosome, while a segmental duplication event was defined as gene duplication on different chromosomes [46]. To our knowledge, the major causes for gene-family expansion were segmental duplication, tandem duplication, and transposition events. To reveal the potential relationship between putative paralogous pairs of Gmcupin gens and segmental duplications, CViT genome search and synteny viewer (http://comparative-legumes.org/) were used to map the Gmcupin genes [33]. The distributions of Gmcupin genes relative to the corresponding duplicate blocks were illustrated in Figure 3. Within the identified duplicated blocks associated with a duplication event, about 18 (81.8%) of Gmcupins were preferentially retained duplicates that located in duplicated regions, with 13 putative paralogous pairs located in a segmental duplication of a long fragment (>1 Mb) and 4 located in a segmental duplication of a short fragment (<1 Mb, Table 2). Meanwhile, another putative paralogous pairs (Gmcupin10.3/Gmcupin10.4) were formed, which was supposed to be possibly due to tandem duplication in the same orientation. Taken together, we implied that long segmental duplication was predominant for evolution of Gmcupin genes, which may be associated with tandem duplication.
Table 2

Duplicated Cupin genes in soybean and the dates of the duplication blocks.

Gene1Gene2DuplicationFragment sizeKaKsKa/KsDate(Mya)
Gmcupin5.1Gmcupin8.1FragmentLarge0.03700.11770.31449.65
Gmcupin9.1Gmcupin15.1FragmentLarge0.03280.14410.227611.81
Gmcupin3.1Gmcupin19.13FragmentLarge0.05760.14790.389512.12
Gmcupin10.5Gmcupin20.1FragmentSmall0.01810.10730.16878.80
Gmcupin1.1Gmcupin2.2FragmentLarge0.01610.19720.081616.16
Gmcupin10.1Gmcupin13.3FragmentLarge0.04570.14930.306112.24
Gmcupin13.1Gmcupin17.1FragmentSmall0.11860.18380.645315.07
Gmcupin7.1Gmcupin16.1FragmentLarge0.01730.15820.109412.97
Gmcupin16.2Gmcupin7.4FragmentLarge0.04040.19160.210915.70
Gmcupin4.1Gmcupin6.1FragmentLarge0.10790.24690.437020.24
Gmcupin8.2Gmcupin15.4FragmentSmall0.02900.09650.30057.91
Gmcupin10.7Gmcupin20.3FragmentLarge0.02700.19250.140315.78
Gmcupin10.6Gmcupin20.4FragmentLarge0.03300.15110.218412.39
Gmcupin16.8Gmcupin19.12FragmentSmall0.22150.34970.633428.66
Gmcupin19.13Gmcupin3.1FragmentLarge0.05760.14790.389512.12
Gmcupin13.4Gmcupin15.1FragmentLarge0.06580.17420.377714.28
Gmcupin19.15Gmcupin3.2FragmentLarge0.02390.17990.132914.75
Gmcupin10.3Gmcupin10.4Tandem Repeat0.03510.09090.38617.45
To investigate whether Darwinian positive selection is involved in the divergence of Gmcupin genes after duplication and trace the dates of the duplication blocks, the substitution rate ratios (Ka/Ks) of 18 paralogous pairs are calculated using DnaSP program. Ks was used to calculate the approximate dates of duplication events. The segmental duplications of the Gmcupin genes in soybean was supposed to originate from 7.45 Mya (million years ago, Ks = 0.0909) to 28.66 Mya (Ks = 0.3497), with a mean value of 13.78 Mya (Ks = 0.1682, Table 2). Meanwhile, the Ks of tandem duplication of Gmcupin10.3 and Gmcupin10.4 was 0.0909, dating the duplication event at 7.45 Mya. Considering the fact that the soybean genome underwent two polyploidy events at 13 and 58 Mya, all the segmental duplications of the Gmcupin genes occurred around 13 Mya when Glycine-specific duplication occurred in the soybean genome [26]. Generally, a Ka/Ks of less than 1 demonstrates a functional constraint with purifying or negative selection of the genes. In this study, The Ka/Ks ratios of 8 segmental duplication pairs were less than 0.3, while the ratios of the other 9 segmental duplication pairs and one tandem duplication pair were more than 0.3, which demonstrated a possibility of significant functional divergence of some Gmcupin genes after the duplication events. The Ka/Ks ratios of another two paralogous gene pair (Gmcupin16.8/19.12 and Gmcupin13.1/17.1) were slightly larger than 0.5 (Table 2). This suggests that they experience relatively rapid evolution following duplication. On this basis, we concluded that Gmcupin gene family experienced strong purifying selection pressure with limited functional divergence after segmental duplications.

Differential expression profile of Gmcupin genes

To highlight the expression profiles of Gmcupin genes, we then analyzed the previously publicly-available RNA-Seq data regarding seven soybean tissues, three pod development stages and seven seed developmental stages. Thirty-five Gmcupin genes had sequence reads in at least one tissue, and most of them showed a distinct tissue-specific expression pattern (Figure 4). For example, two genes (Gmcupin17.1 and Gmcupin15.3) had a significantly higher transcript accumulation in the young leaf of soybean. Gmcupin16.8 was mainly expressed during pod development, while Gmcupin16.5, Gmcupin03.2 and Gmcupin20.4 were specifically expressed in soybean root. Besides, three genes (Gmcupin03.1, Gmcupin13.2 and Gmcupin19.13) of subfamily X were highly expressed at the later stage of seed development. Most Gmcupin genes showed a relative low expression level in soybean nodule (Figure 4). These genes were clustered into five groups (A–E) and four groups (I–IV) based on their expression patterns in soybean tissues (excet seeds) and the expression profiles during seven soybean seed development stages (Figure 5). The genes in clusters A–E were mainly expressed in flower/root, root, pod/root, young leaf and pod, respectively. Six genes in cluster I mainly expressed during the early stage of soybean seed development, while seven genes in cluster II and III mainly expressed during the later stage of soybean seed development. In addition, three genes in cluster III having a much higher and specific expression level during soybean seed development from 25 days after flower (DAF) to 42DAF. Further, genes of cluster IV were expressed in most stages of soybean seed development.
Figure 4

Expression profile of Gmcupin genes in different tissues.

The numbers in the expression profile are normalized data, which were calculated as reads/kilobase/million normalization of the raw data. All data were downloaded from the SoyBase.

Figure 5

Expression profiles of 35 expressed Gmcupin genes in different tissues.

a. Heatmap showing hierarchical clustering of 35 expressed Gmcupin genes among various tissues analyzed. b. Heatmap showing hierarchical clustering of 35 expressed Gmcupin genes during the development of soybean seeds.

Expression profile of Gmcupin genes in different tissues.

The numbers in the expression profile are normalized data, which were calculated as reads/kilobase/million normalization of the raw data. All data were downloaded from the SoyBase.

Expression profiles of 35 expressed Gmcupin genes in different tissues.

a. Heatmap showing hierarchical clustering of 35 expressed Gmcupin genes among various tissues analyzed. b. Heatmap showing hierarchical clustering of 35 expressed Gmcupin genes during the development of pan class="Species">soybean seeds. The evolutionary fates of duplicate genes may be classified into subfunctionalization (partition of original functions), neofunctionalization (acquisition of novel functions), or nonfunctionalization(loss of original functions) [47]. In this study, we investigated the functional redundancy of Gmcupin genes with high proporation of segmental/tandem duplications. Six paralogous pairs (Gmcupin03.1/19.13, Gmcupin07.1/16.1, Gmcupin10.1/13.3, Gmcupin01.1/02.2, Gmcupin16.8/19.12 and Gmcupin10.7/20.3) derived from segmental duplications and one paralogous pair (Gmcupin10.3/10.4) derived from tandem duplication shared almost identical expression patterns. In contrast, the expression patterns of another seven paralogous pairs (Gmcupin17.1/13.1, Gmcupin08.2/15.4, Gmcupin04.1/06.1, Gmcupin12.4/20.2, Gmcupin10.5/20.1, Gmcupin10.6/20.4 and Gmcupin3.2/19.5) diversified significantly. These findings indicated that expression profiles of Gmcupins have diverged substantially after segmental/tandem duplications. Therefore, we speculate that Gmcupins have been retained by substantial subfunctionalization during soybean evolutionary processes.

Artificial selection analysis for Gmcupins during soybean domestication

Thirty-five Gmcupin genes were analyzed for the selection effects during soybean domestication based on the sequence diversity analysis between seventeen wild soybean and fourteen cultivars. The reverse distribution of SNPs in different evolutionary type of soybeans was defined as strong selected sites, and then Cupin genes with one or more type of reverse distribution were assumed to undergo an artificial selection during soybean domestication. Sixteen Gmcupins have selected site(s), among which more than one selected sites were determined in 8 Gmcypins and one selected sites in 8 genes (Table 3). Additionally, all SNP sites were selected in Gmcupin10.3 and Gmcupin07.2 genes, which implied these genes may have undergone strong selection effects during soybean domestication. Interestingly, selected sites were identified in Gmcupin03.1 (7 sites), Gmcupin13.2 (1 site) and Gmcupin19.13 (1 site) that were highly expressed at the later stages of soybean seed. The genetic diversity of most Gmcupins was declined sharply in cultivars compared with that of wild soybeans. However, Gmcupin10.7 gene that specifically expressed in soybean root showed three types of haplotype in wild soybeans, while four types of haplotype were identified in cultivars. Further, a new type of haplotype in Gmcupin10.7 appeared during soybean domestication under the pressure of artificial selection, which would endow it with new functions. These selected genes reflected the important roles of Gmcupins on soybean domestication and contribute to the cultivation of soybeans in order to meet the demands of human beings.
Table 3

Selected sites of Gmcupin genes during soybean domestication.

NamePositionSNPsNamePositionSNPs
WildCultivarWildCultivar
Gmcupin03.13984087111C/5T5C/9TGmcupin10.3368082279T/7A1T/13A
398415608G/7T0G/13T3680846910G/5A1G/13A
398417778C/3T3C/8T3680848810A/5G1A/13G
3984215310T/6A6T/8A3680858610A/4T1A/13T
398421869T/5C4T/7CGmcupin10.4368125109C/6T1C/13T
3984250810A/5G5A/9G3681253010G/6A1G/13A
3984270410T/6A5T/8A368127298G/6C1G/13C
Gmcupin07.1316374812A/4C4A/9CGmcupin10.73976738714T/1C4T/5C
316417910A/5T4A/9T3976824511T/1A4T/6A
Gmcupin07.2316752613T/4C4T/10CGmcupin13.12081625513C/4A5C/8A
316781912T/4C4T/10CGmcupin13.2221128209C/7G5C/9G
316794414G/3T4G/10TGmcupin16.4585316212C/4G6C/8G
316802913T/2C4T/8C585337811A/6G6A/7G
316806913T/2A4T/8A585340611G/5T6G/8T
Gmcupin07.4317989814T/2C4T/10CGmcupin16.8595199610T/4C3T/7C
Gmcupin10.172004979C/6A2C/12AGmcupin17.1405252810A/5C4A/9C
720088813A/4G2A/12GGmcupin19.134236714710C/4T6C/8T
72009568G/8A2G/12AGmcupin19.15475248738A/8G5A/9G
720173412T/4C2T/12CGmcupin20.1321061079G/8T1G/12T
pan class="Chemical">Pairwise identities between homologous pairs of Cupin genes from pan class="Species">soybean. (DOC) Click here for additional data file.
  44 in total

1.  Evolution of functional diversity in the cupin superfamily.

Authors:  J M Dunwell; A Culham; C E Carter; C R Sosa-Aguirre; P W Goodenough
Journal:  Trends Biochem Sci       Date:  2001-12       Impact factor: 13.807

Review 2.  Cupins: the most functionally diverse protein superfamily?

Authors:  Jim M Dunwell; Alan Purvis; Sawsan Khuri
Journal:  Phytochemistry       Date:  2004-01       Impact factor: 4.072

3.  Structure of oxalate decarboxylase from Bacillus subtilis at 1.75 A resolution.

Authors:  Ruchi Anand; Pieter C Dorrestein; Cynthia Kinsland; Tadhg P Begley; Steven E Ealick
Journal:  Biochemistry       Date:  2002-06-18       Impact factor: 3.162

4.  Open source clustering software.

Authors:  M J L de Hoon; S Imoto; J Nolan; S Miyano
Journal:  Bioinformatics       Date:  2004-02-10       Impact factor: 6.937

Review 5.  Significance of inducible defense-related proteins in infected plants.

Authors:  L C van Loon; M Rep; C M J Pieterse
Journal:  Annu Rev Phytopathol       Date:  2006       Impact factor: 13.078

Review 6.  Cupins: a new superfamily of functionally diverse proteins that include germins and plant storage proteins.

Authors:  J M Dunwell
Journal:  Biotechnol Genet Eng Rev       Date:  1998

7.  Relation of protein synthesis in imbibing wheat embryos to the cell-free translational capacities of bulk mRNA from dry and imbibing embryos.

Authors:  E W Thompson; B G Lane
Journal:  J Biol Chem       Date:  1980-06-25       Impact factor: 5.157

8.  Isolation of a germin-like protein with manganese superoxide dismutase activity from cells of a moss, Barbula unguiculata.

Authors:  T Yamahara; T Shiono; T Suzuki; K Tanaka; S Takio; K Sato; S Yamazaki; T Satoh
Journal:  J Biol Chem       Date:  1999-11-19       Impact factor: 5.157

9.  A germin-like protein gene (CchGLP) of Capsicum chinense Jacq. is induced during incompatible interactions and displays Mn-superoxide dismutase activity.

Authors:  Fabiola León-Galván; Ahuizolt de Jesús Joaquín-Ramos; Irineo Torres-Pacheco; Ana P Barba de la Rosa; Lorenzo Guevara-Olvera; Mario M González-Chavira; Rosalía V Ocampo-Velazquez; Enrique Rico-García; Ramón Gerardo Guevara-González
Journal:  Int J Mol Sci       Date:  2011-10-25       Impact factor: 5.923

10.  InterProScan: protein domains identifier.

Authors:  E Quevillon; V Silventoinen; S Pillai; N Harte; N Mulder; R Apweiler; R Lopez
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

View more
  5 in total

1.  Molecular machinery of auxin synthesis, secretion, and perception in the unicellular chlorophyte alga Chlorella sorokiniana UTEX 1230.

Authors:  Maya Khasin; Rebecca R Cahoon; Kenneth W Nickerson; Wayne R Riekhof
Journal:  PLoS One       Date:  2018-12-10       Impact factor: 3.240

2.  Genome-Wide Identification and Expression Profiling of Germin-Like Proteins Reveal Their Role in Regulating Abiotic Stress Response in Potato.

Authors:  Madiha Zaynab; Jiaofeng Peng; Yasir Sharif; Mahpara Fatima; Mohammed Albaqami; Rashid Al-Yahyai; Ali Raza; Khalid Ali Khan; Saqer S Alotaibi; Ibrahim A Alaraidh; Hassan O Shaikhaldein; Shuangfei Li
Journal:  Front Plant Sci       Date:  2022-02-17       Impact factor: 5.753

3.  Genome-wide identification and functional analysis of cupin_1 domain-containing members involved in the responses to Sclerotinia sclerotiorum and abiotic stress in Brassica napus.

Authors:  Yizhou He; Yan Li; Zetao Bai; Meili Xie; Rong Zuo; Jie Liu; Jing Xia; Xiaohui Cheng; Yueying Liu; Chaobo Tong; Yuanyuan Zhang; Shengyi Liu
Journal:  Front Plant Sci       Date:  2022-08-01       Impact factor: 6.627

4.  A Global Analysis of the Polygalacturonase Gene Family in Soybean (Glycine max).

Authors:  Feifei Wang; Xia Sun; Xinyi Shi; Hong Zhai; Changen Tian; Fanjiang Kong; Baohui Liu; Xiaohui Yuan
Journal:  PLoS One       Date:  2016-09-22       Impact factor: 3.240

5.  Genome-wide identification and evolutionary analysis of leucine-rich repeat receptor-like protein kinase genes in soybean.

Authors:  Fulai Zhou; Yong Guo; Li-Juan Qiu
Journal:  BMC Plant Biol       Date:  2016-03-02       Impact factor: 4.215

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.