Literature DB >> 32968100

Genome-wide identification and expression analysis of bZIP gene family in Carthamus tinctorius L.

Haoyang Li1, Lixia Li1, Guodong ShangGuan1, Chang Jia1, Sinan Deng1, Muhammad Noman1, Yilin Liu1, Yongxin Guo1, Long Han1, Xiaomei Zhang1, Yuanyuan Dong1, Naveed Ahmad1, Linna Du2, Haiyan Li3, Jing Yang4.   

Abstract

The basic leucine zipper (bZIP) is a widely known transcription factors family in eukaryotes. In plants, the role of bZIP proteins are crucial in various biological functions such as plant growth and development, seed maturation, response to light signal and environmental stress. To date, bZIP protein family has been comprehensively identified in Arabidopsis, castor, rice, ramie, soybean and other plant species, however, the complete genome-wide investigation of Carthamus tinctorius-bZIP family still remains unexplained. Here, we identified 52 putative bZIP genes from Carthamus tinctorius using a draft genome assembly and further analyzed their evolutionary classification, physicochemical properties, Conserved domain analysis, functional differentiation and the investigation of expression level in different tissues. Based on the common bZIP domain, CtbZIP family were clustered into 12 subfamilies renamed as (A-J, S, X), of which the X is a unique subfamily to Carthamus tinctorius. A total of 20 conserved protein motifs were found in CtbZIP proteins. The expression profiling of CtbZIP genes deciphered their tissue-specific pattern. Furthermore, the changes in CtbZIP transcript abundance suggested that their transcription regulation could be highly influenced by light intensity and hormones. Collectively, this study highlights all functional and regulatory elements of bZIP transcription factors family in Carthamus tinctorius which may serve as potential candidates for functional characterization in future.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 32968100      PMCID: PMC7511407          DOI: 10.1038/s41598-020-72390-z

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Introduction

Transcription factors (TFs) are regulated through sequence-specific DNA-binding proteins which interact with relevant cis-acting elements[1]. The interaction alters transcription activity and stimulates or suppresses gene expression[2]. As switch of gene expression, TFs play important regulatory roles in almost all processes of plant life[3]. Functional characterization of TFs is in-depth analyzed through biological processes and transcriptional regulatory networks. Thus, TFs are significant components of abiotic stress signaling pathways. Among them, the basic leucine zipper (bZIP) also accounts for a large and diverse TF family. All bZIP TFs consist of two structural components: a basic region (N-x7-R/K-x9) for sequence-specific DNA binding, and a leucine zipper. In Arabidopsis, 78 bZIP genes have been reported[4], whereas 89 in Oryza sativa[5], 55 in Vitis vinifera[6], 125 in Zea mays[7], 247 in Brassica napus L.[8], 92 in Sorghum vulgare[9] and 131 in Glycine max[10]. In general, it is known that the putative bZIP genes have been classified as tens of groups on the basis of sequence similarities of their basic regions and conserved motifs. AtbZIP gene family was classified into 13 subfamilies in Arabidopsis thaliana[4] while the predicted bZIP proteins of Oryza sativa into 11 groups based on DNA-binding specificity and amino acid sequence[11]. Although the number of groups of the bZIP family in Oryza sativa (11 groups) and Arabidopsis thaliana (13 groups) is different, however affiliation’s relationship is similar. The interspecies clustering also indicates that homologous bZIP from both species fell into common group[3]. In plants, bZIP TFs regulate many transcriptional response pathways in multiple biological processes. They regulate the development of tissues and organs, including seed maturation and germination, embryogenesis[12], blooming[13] and photomorphogenesis[14]. In addition, bZIP TFs are involved in responses to abiotic and biotic stresses such as extreme temperatures, water deficit, high osmolarity and salinity and defence from pathogens[15,16]. AtbZIP TFs as TGA2, TGA5 and TGA6 regulate salicylic acid-dependent mechanisms and activate jasmonic acid and ethylene-dependent pathway in abiotic stress, while ABF3 and ABF4, play essential roles in ABA stress responses[17]. HY5 and HYH, the main regulators of photomorphogenesis, mediate the light response in Arabidopsis thaliana[18]. Similarly, OsBZ8 is induced by ABA and mediates salt resistance[19], while LIP19 functions as a molecular switch for low-temperature signal transduction in rice[20]. Carthamus tinctorius L. is an annual Asteraceae plant and an economic crop grown for edible oil extracted from its seed. It is suitable to grow in environments with insufficient moisture[21] thus can be grown on marginal agricultural lands that are suitable for few other crops[20,21]. It is a minor oilseed crop in terms of production among unexploited crops, which not only has high ornamental value, but also has other important practical value. Safflower seeds are rich in oil and contain high levels of unsaturated fatty acids, vitamin E and oryzanol. The application of safflower seed oil is very extensive in fuel industry, cardiovascular health care and production of pharmaceuticals as a plant factory. Therefore, genetic improvement of safflower is necessary to increase its acceptability and utility as an important oilseed crop, but due to its limited gene information resource, this demand still persists. To accelerate the improvement of safflower, its genome was sequenced recently. Development of safflower gene resources has the potential to speed up the process of molecular breeding, and can be used to investigate patterns of genome evolution. Complete genomic information of safflower has not yet been uncovered fully, therefore, demands the analysis of molecular mechanisms, network regulation and functional diversity. A draft genome assembly of safflower was produced covering 866 million bp after sequencing a single, short insert library to ~ 21 × depth[22]. The full-length transcriptome sequencing of safflower generated 10.43 GB clean data from which 38,302 redundant sequences were captured[23]. We accomplished the de novo transcriptome assembly of safflower from which we identified putative oleosin genes[24] by Solexa-based deep sequencing and investigated some genes related to the biosynthesis of safflower yellow[25]. In addition, we have also sequenced the genome of safflower (Accession: PRJNA399628 ID: 399628). In this study, we screened fifty-two bZIP family genes from safflower genome database, named CtbZIP1-52. We not only analyzed the structural characteristics of CtbZIP family genes, identified CtbZIP motifs and constructed a phylogenetic tree, but also speculated their network regulating relationship and functional diversification among the members of CtbZIP family. The expression patterns of 52 CtbZIP genes in various tissues and different developmental stages were predicted by RPKM values, and the accuracy of expression profiles was verified by RT-qRCR. This study provides a comprehensive genome-wide investigation and expression analysis of CtbZIP family of safflower which would be important for functional characterization of CtbZIP TFs involved in biological processes and transcriptional regulatory networks, and then lay a foundation for molecular breeding of safflower in the future.

Results

Genome-wide identification of CtbZIP TFs

Through in silico analyses (detailed in methods), a total of 52 members in safflower bZIP gene family were identified. Based on the splicing results of genomic sequence, we sequentially sorted 52 CtbZIP proteins according to unigene number from small to large provisionality and named them CtbZIP1-52 (Table 1). Their name, ID, ORF (open reading frame) length and polypeptide length as well as conserved domain position are mentioned in Table 1 while molecular weight, PI (Isoelectric Point) and Grand Average of Hydropathicity (GRAVY) are given in supplementary file S1, Table S1. The predicted molecular weights lie between 11.98 kDa to 86.19 kDa while PI ranges from 4.86 to 9.78. CtbZIP38 gene has the shortest conserved domain with 21 amino acids, whereas CtbZIP10 possesses the longest domain (74 amino acids). All negative GRAVY values indicate their hydrophilic nature.
Table 1

List of the identified safflower bZIP TFs and their attributes.

S. noNameID in genomic dataLength of ORFNumber of aaConserved domain position (aa)
1CtbZIP1ccg0005742052683197–258
2CtbZIP2ccg001746795264186–229
3CtbZIP3ccg001897981326238–284
4CtbZIP4ccg0019101,113370240–277
5CtbZIP5ccg0022161,050349107–168
6CtbZIP6ccg00230951617125–82
7CtbZIP7ccg002770852283236–278
8CtbZIP8ccg0036031,101366289–338
9CtbZIP9ccg00520459419771–130
10CtbZIP10ccg005666945314178–224
11CtbZIP11ccg0076751,044347150–223
12CtbZIP12ccg01061881026990–148
13CtbZIP13ccg011035891296138–188
14CtbZIP14ccg0122411,710569525–550
15CtbZIP15ccg0124091,302433354–405
16CtbZIP16ccg0129791,152383295–357
17CtbZIP17ccg0139751,21840541–92
18CtbZIP18ccg0143661,119372243–281
19CtbZIP19ccg01456056718875–133
20CtbZIP20ccg01489733911257–127
21CtbZIP21ccg01507442914239–88
22CtbZIP22ccg01556831810534–85
23CtbZIP23ccg01623842614156–96
24CtbZIP24ccg0167041,422473257–309
25CtbZIP25ccg01685047115644–95
26CtbZIP26ccg01695047715848–108
27CtbZIP27ccg0169791,167388252–314
28CtbZIP28ccg017531780259165–222
29CtbZIP29ccg017533750249165–209
30CtbZIP30ccg01775746515433–77
31CtbZIP31ccg01777241713827–85
32CtbZIP32ccg0182611,245414309–371
33CtbZIP33ccg0187851593530401–459
34CtbZIP34ccg0188971509502365–409
35CtbZIP35ccg0197341,04134656–95
36CtbZIP36ccg02007144414724–63
37CtbZIP37ccg02134951917255–108
38CtbZIP38ccg022221639212171–191
39CtbZIP39ccg02242345915265–116
40CtbZIP40ccg0228691,560519215–256
41CtbZIP41ccg0231661,257418216–268
42CtbZIP42ccg023405969322241–280
43CtbZIP43ccg0251701,005334201–245
44CtbZIP44ccg0252351755584432–490
45CtbZIP45ccg0270831,19739892–125
46CtbZIP46ccg02877562720893–130
47CtbZIP47ccg02957739012934–93
48CtbZIP48ccg030605876291240–283
49CtbZIP49ccg03100959419784–134
50CtbZIP50ccg031237960319149–205
51CtbZIP51ccg03123878025988–145
52CtbZIP52ccg03183763621184–138

ORF open reading frame, aa amino acids.

List of the identified safflower bZIP TFs and their attributes. ORF open reading frame, aa amino acids.

Classification of the CtbZIP proteins based on phylogram

We constructed a phylogenetic tree to elucidate the evolutionary relationship among bZIP TFs of Carthamus tinctorius L., Arabidopsis thaliana, Oryza sativa and Ricinus communis (Fig. 1). Arabidopsis thaliana bZIP family has been classified into 13 subfamilies[4]. The bZIP TFs of most of plant species are classified according to the subfamilies of Arabidopsis. For example, the bZIP proteins of Oryza sativa were divided into 10 subfamilies[5], Ricinus communis into 9[26] and Camellia sinensis into 11[27]. We divided the 52 CtbZIP TFs into 12 subfamilies (CtbZIP-A, CtbZIP-B, CtbZIP-C, CtbZIP-D, CtbZIP-E, CtbZIP-F, CtbZIP-G, CtbZIP-H, CtbZIP-I, CtbZIP-J, CtbZIP-S and CtbZIP-X) on the basis of the classification of Arabidopsis. However, CtbZIP13, CtbZIP14, CtbZIP20 and CtbZIP46 could not be aggregated into any subfamily thus were grouped together into a separate branch named as subfamily X. None of CtbZIP proteins clustered into subfamily K and M indicating loss of these proteins throughout safflower evolution. A separate phylogenetic reconstruction elucidating the evolutionary relationship of Arabidopsis and safflower bZIP proteins is given in figure S1.
Figure 1

Cladogram of the CtbZIP TFs of Safflower, Arabidopsis thaliana, Oryza sativa and Ricinus communis. All bZIP TFs clustered into 12 distinct clades, marked by curves with different colors. Subfamily S has got the highest number (15) while J, H, C and B got the lowest (1) each, whereas 4 CtbZIPs exclusively clustered together in subfamily X.

Cladogram of the CtbZIP TFs of Safflower, Arabidopsis thaliana, Oryza sativa and Ricinus communis. All bZIP TFs clustered into 12 distinct clades, marked by curves with different colors. Subfamily S has got the highest number (15) while J, H, C and B got the lowest (1) each, whereas 4 CtbZIPs exclusively clustered together in subfamily X.

Motif analysis of the CtbZIP proteins

Except the bZIP conserved domain, bZIP proteins usually contain other motifs which might bind potential functional sites thereby activating their function. Using ORF Finder tool of NCBI database, we found a complete open reading frame of all 52 CtbZIP transcripts. To find the conserved domains, Pfam database[28] showed one or more of the intact conserved domains (bZIP_1, bZIP_2 and bZIP Maf while 20 conserved motifs were identified using MEME software[29] the names and sequence logos of which are illustrated in Fig. 2. We counted the width and E value of each conservative motif using TB tools[30] (Fig. 3A), and the distribution number of motifs in each subfamily was depicted (Fig. 3B). In terms of size, motif 20 was the shortest (20 aa) while motif 3, 11, 12, 17 and 18 were longest having 50 aa each. The motif average width lied around 38 aa. Interestingly, motif1 and motif2 were recognized as bZIP conserved domains and could be found in all of the subfamilies, however some subfamilies also had unique motif compositions (Fig. 3B,C). For example, subfamily A possesses a unique motif6, whereas motif11 is unique to subfamily D, motif 17 in subfamily I and motif13 in subfamily S. For the safflower specific subfamily X, CtbZIP20 and CtbZIP46 specifically contain motif 19, which are associated next to the N-terminus of the amino acid sequence and substantially identical to the bZIP conserved domain (Fig. 3C). All of these motifs indicate the group-specific functions for members in each group.
Figure 2

Sequence logos of CtbZIPs conserved motifs. From the multiply aligned protein sequences of CtbZIP, 20 conserved motifs were determined. Among them, Motif1 and Motif2 were common in all orthologs of the phylogenetic tree.

Figure 3

Conserved domain analysis of CtbZIP proteins in 12 subfamilies. (A) Width and E-value of sequence logos for 20 motifs. The blue portion indicates width and orange indicates E-value. (B) The number of subfamily in each motif. (C) Distribution of conserved structures in all 52 CtbZIP proteins.

Sequence logos of CtbZIPs conserved motifs. From the multiply aligned protein sequences of CtbZIP, 20 conserved motifs were determined. Among them, Motif1 and Motif2 were common in all orthologs of the phylogenetic tree. Conserved domain analysis of CtbZIP proteins in 12 subfamilies. (A) Width and E-value of sequence logos for 20 motifs. The blue portion indicates width and orange indicates E-value. (B) The number of subfamily in each motif. (C) Distribution of conserved structures in all 52 CtbZIP proteins.

Functional differentiation of CtbZIP TFs

Some motifs of bZIP TFs participate in a variety of physiological processes. To understand their function in the biological processes, we predicted the function of CtbZIP TFs in silico using Gene Ontology (GO) terms[31]. All of 52 CtbZIP TFs were analyzed, 45 of which categorized into three primary GO functional categories, biological processes (BP), molecular function (MF) and cellular components (CC) (Fig. 4). Among the 45 CtbZIPs, none was individually enriched into a certain GO functional category. Six CtbZIP TFs (13%) are enriched in three major categories CC, BP and MF while 39 CtbZIP TFs (87%) enriched in BP and MF. It can be seen that CtbZIP has many functions that affect the biological process of safflower. Besides, 45 CtbZIP TFs are classified into 13 subcategories, accounting for 57% of the enrichment data. The enrichment analysis showed that besides subfamily J, CtbZIP TFs of 11 subfamilies are enriched (Fig. 5A). At the same time, 6 subcategories are significantly enriched. The majority of CtbZIP TFs have DNA binding activity (Fig. 5B) and participate in the process of nitrogen metabolism. A number of CtbZIP TFs might respond to various abiotic stresses. All CtbZIP TFs have transcriptional regulatory activity, this allows them to regulate the growth and development of safflower. Based on these findings, the function of CtbZIPs may be associated with various biosynthetic and metabolic processes in response to abiotic and biotic stresses to affect the development of various tissues and organs.
Figure 4

Venn diagram of the functional categorization of CtbZIP TFs. BP denotes biological process, MF stands for molecular function and CC for cellular component.

Figure 5

Variation of transcript function class in the CtbZIP family. (A) Distribution of each Subclass in 13 subspecies of Safflower CtbZIP TFs. (B) Variation of the functional categories of the 52 CtbZIP TFs.

Venn diagram of the functional categorization of CtbZIP TFs. BP denotes biological process, MF stands for molecular function and CC for cellular component. Variation of transcript function class in the CtbZIP family. (A) Distribution of each Subclass in 13 subspecies of Safflower CtbZIP TFs. (B) Variation of the functional categories of the 52 CtbZIP TFs.

Expression profiles and network analysis of CtbZIP TFs

The bZIP TFs are not only the most widely distributed and most conserved eukaryotic transcription factors, but their function is also diverse. The safflower bZIP TFs have a variety of functions and there are synergistic effects in the exercise of their functions. In order to explore the expression profiles and the interaction among the CtbZIP TFs, we analyzed their expression variation in different tissues, including roots, stems, leaves, flowers, DAF10-seeds, DAF13-seeds and DAF20-seeds by heatmap (Fig. 6). We noticed that CtbZIP13 highly expresses in roots. CtbZIP6 and 25 transcripts are abundant while that of CtbZIP40, 23 and 29 are less in stem. CtbZIP13 and 25 have higher expression in flowers than in other samples. High expression of CtbZIP5 is observed in DAF13-seeds. Similarly, CtbZIP52 highly expresses in DAF20-seeds. However, the expression levels of CtbZIP22 is almost the same in all of the 7 samples. The varied expression pattern indicates functional divergence of different groups of CtbZIP TFs. These results indicate that the functions of CtbZIP family are differentiated with differentiation in their expression.
Figure 6

Heatmap of CtbZIP genes expressed among 7 tissues based on the fold change (log2) in RPKM values. The color scale at the top represents log2-transformed reads per kilobase million (RPKM) values.

Heatmap of CtbZIP genes expressed among 7 tissues based on the fold change (log2) in RPKM values. The color scale at the top represents log2-transformed reads per kilobase million (RPKM) values. We quantified the expressions of all 52 CtbZIP TFs in different tissues and seeds (of various developmental stages). The expression networks (p ≤ 0.05) (Fig. 7) were constructed using BioLayout Express 3D 3.2 software[32]. The CtbZIP TFs are a complex family with 51 nodes and 1,199 edges. Among them, 43 transcripts (85%) are more tended to have associated expression and form a co-expression network whereas the other 8 transcripts also exhibit weak co-expression. The network is composed of 5 clusters; the largest cluster contains sixteen transcripts, while the smallest cluster contains eight. There is a certain degree of related expression trend between these clusters and this tendency was statistically significant. These results indicate that although the functions and expressions of CtbZIP family members have dramatically diverged, they retain to some extent, the tendency of correlated expression and functional cooperation.
Figure 7

Network analysis of the CtbZIP TFs in seven different tissues of safflower. The co-expression network was constructed from the expression of 52 CtbZIP genes. The network was constructed at P ≤ 5.0 × e−02. It consists of 51 gene nodes and 1,199 edges. Five clusters in the network are shown separately.

Network analysis of the CtbZIP TFs in seven different tissues of safflower. The co-expression network was constructed from the expression of 52 CtbZIP genes. The network was constructed at P ≤ 5.0 × e−02. It consists of 51 gene nodes and 1,199 edges. Five clusters in the network are shown separately.

Expression analysis of CtbZIP TFs in various tissues

To further verify the authenticity of the expression pattern, we detected the expression level of 52 CtbZIP genes in different tissues of safflower including roots, stems, leaves, flowers, seeds, cotyledons and hypocotyls using RT-qPCR (Fig. 8). The results showed that the CtbZIP25 gene is highly expressed in all tissues and we speculated that it may be involved in various stages of plant growth and development. The CtbZIP13 is highly expressed in root and might play a role in root growth. In seeds, CtbZIP52 has the highest expression and might regulate the development of seeds. Likewise, CtbZIP25and CtbZIP30 have higher expression in hypocotyls. The expression level of CtbZIP6 and CtbZIP25 peak in stem and they may affect the growth of the stem. Conversely, the expression level of CtbZIP2, CtbZIP23, CtbZIP31 and CtbZIP34 is relatively low in all tissues, among which CtbZIP34 is the lowest in roots while CtbZIP2, CtbZIP22, CtbZIP31 and CtbZIP47 are the lowest in stems. Similarly, CtbZIP23 and CtbZIP47 are the lowest in leaves, CtbZIP23 in flowers and CtbZIP19, CtbZIP20 and CtbZIP23 in seeds have the lowest expression. However, CtbZIP22 gene expresses in cotyledon and hypocotyl after seed germination. This indicates that the CtbZIP22 gene is specifically involved in seed germination. In short, the results of RT-qPCR show that the expression pattern of safflower is consistent with the predicted expression. According to this expression pattern, the function of CtbZIP TFs can be more effectively estimated.
Figure 8

Relative expression profiles of the CtbZIP genes in various tissues. (A) Root, (B) stem, (C) leaf, (D) flower, (E) seed, (F) cotyledon, (G) hypocotyl. The reference gene used in RT-qPCR is EF1α. Values are average of three replicates ± SD. Asterisks indicate significant difference applying ANOVA (p < 0.05, p < 0.01 and p < 0.001).

Relative expression profiles of the CtbZIP genes in various tissues. (A) Root, (B) stem, (C) leaf, (D) flower, (E) seed, (F) cotyledon, (G) hypocotyl. The reference gene used in RT-qPCR is EF1α. Values are average of three replicates ± SD. Asterisks indicate significant difference applying ANOVA (p < 0.05, p < 0.01 and p < 0.001).

Expression level of CtbZIP TFs with GA3 treatment under different light radiation

In order to study the function of CtbZIP TFs, we detected the expression level of all 52 CtbZIP genes by RT-qPCR after GA3 treatment under different light radiation (no treatment under 16.8 MJ/m2 light radiation, no treatment under 5.04 MJ/m2 light radiation, GA3 treatment under 5.04 MJ/m2 light radiation and GA3 treatment under MJ/m2 light radiation) (Fig. 9). Among 52 genes, the expression of CtbZIP15, CtbZIP26 and CtbZIP28 highly increased in all the seven tissues, however, CtbZIP28 and CtbZIP38 upregulated in six tissues (excluding roots) after GA3 treatment under 16.8 MJ/m2 light radiation. Similarly, CtbZIP2, CtbZIP33, CtbZIP50 and CtbZIP51 in roots and leaves while CtbZIP6, CtbZIP36, CtbZIP49 and CtbZIP52 in seeds were up-regulated. CtbZIP8 and CtbZIP15 were significantly affected by illumination intensity and their expression increased in leaves, flowers, seeds, cotyledon and hypocotyl. Likewise, in seeds, CtbZIP35, CtbZIP40 and CtbZIP45 up-regulated after GA3 treatment under 5.04 MJ/m2 light radiation and CtbZIP16, CtbZIP27 and CtbZIP32 in cotyledon and hypocotyl were induced by GA3 and light.
Figure 9

Expression profiles of CtbZIP genes after GA3 treatment under different light radiation. The red bars indicate no treatment under 16.8 MJ/m2 light radiation (group 1). The orange indicates no treatment under 5.04 MJ/m2 light radiation (group 2). Yellow indicates GA3 treatment under 5.04 MJ/m2 light radiation (group 3). Green indicates GA3 treatment under 16.8 MJ/m2 light radiation (group 4). The reference gene was EF1α. Values are average of three replicates ± SD. Asterisks indicate significant difference applying ANOVA (p < 0.05, p < 0.01 and p < 0.001).

Expression profiles of CtbZIP genes after GA3 treatment under different light radiation. The red bars indicate no treatment under 16.8 MJ/m2 light radiation (group 1). The orange indicates no treatment under 5.04 MJ/m2 light radiation (group 2). Yellow indicates GA3 treatment under 5.04 MJ/m2 light radiation (group 3). Green indicates GA3 treatment under 16.8 MJ/m2 light radiation (group 4). The reference gene was EF1α. Values are average of three replicates ± SD. Asterisks indicate significant difference applying ANOVA (p < 0.05, p < 0.01 and p < 0.001).

Discussion

Safflower is an important plant used for ornamental, food, feed and medicinal purposes. In terms of tolerance for abiotic stresses such as water deficit, it is a tough plant however, for increasing demand of edible oil as well its vast pharmaceutical properties, its improvement seeks comprehensive understanding through omics. Omics by combining genomics, transcriptomics, proteomics and metabolomics (as solving a puzzle) attempts to obtain a clear picture of molecular and biochemical circuitries underlying primary and secondary metabolites/products[33]. In the same race, we tried to unravel an important class of transcription factors in Carthamus tinctorius L. Transcription factors express genes thus are essentially present in all organisms. They comprise several classes holding fundamental role in various growth and developmental processes. The bZIP gene family plays role in plant growth and seed and fruit development[34]. Plant bZIP TFs preferentially bind promoters containing ACGT core sequence including A- (TACGTA), C- (GACGTC) and G- (CAGTG) boxes[35], however nonpalindromic targets have also been reported[36,37]. The genome-wide analysis of CtbZIPs would aid in their further functional analyses as well as safflower breeding research. Our genomic survey identified 52 members in Carthamus tinctorius bZIP TF family. These TFs constitute a large families in all organisms reported to date. CtbZIPs also look a big gene family however, as compared to Arabidopsis (78 TFs), rice (89), maize (125), Brassica napus (247) and soybean (131), safflower got a relatively small bZIP family. Based on phylogenetic reconstruction (Fig. 1), we categorized CtbZIPs into 13 subfamilies (A-J, S and X) according to their relevance in Arabidopsis[4], rice[5], Ricinus communis[26] and Camellia sinensis[27]. This categorization was further supported protein structure analyses. None of CtbZIP proteins clustered into subfamily K and M indicating loss of these proteins throughout safflower evolution. CtbZIPs protein structure analyses revealed 20 motifs in total, same as reported in Manihot esculanta[38], which were named sequentially from motif1 to motif20 (Figs. 2, 3). Relating their motifs to some known motifs revealed some functions of CtbZIP TFs. The motif2 was further identified as the extension of the leu zipper region, closely related to motif1. The motif4 was a new highly conserved cysteine-rich sequence which might be involved in protein–protein or protein-DNA interactions. In most of the cases, motif1 and motif2 conserved domains are located next to each other, however, some motifs are located far from each other. The maximum distance between two motifs is found in CtbZIP45 of subfamily D. In addition, there are three motifs (motif4, 5, 13) between bZIP domains in subfamily E of CtbZIP TFs, and motif1 and motif2 together with three motifs form a conserved structural group, as the subfamily E of OsbZIPs[5]. The same situation exists in the subfamily I of CtbZIP TFs, motif1 and motif2 together with motif4, 5, 9, 17 form a conserved structural group, but motif9 is not between motif1 and motif2. The conserved groups of E and I subfamilies exist near the C-terminus which predicts that the functions of subfamily E and subfamily I could make a significant difference with other subfamilies. CtbZIP26 only contains the bZIP domain (motif1 and motif2) in the subfamily H, which confirms that the function of CtbZIP26 is more conservative. The motif11 in subfamily D is a conserved structure of Dog1 (PF14144) also found in Arabidopsis bZIP[4]. This family appears to be a highly specific controller of seed dormancy. On one hand, MEME results further prove that outcomes of Hidden Markov Model (HMM) have high reliability. On the other hand, they also reveal the functional diversity of CtbZIP family. These analyses are an important starting point for further functional verification. The genome-wide expression prediction of CtbZIPs genes flaunted their differential transcript level in various developmental stages and tissues. As shown in Fig. 6, there seems a vast level of divergence in expression pattern with respect to tissue type and seed stage. The varied expression pattern indicates functional divergence of different groups of CtbZIP TFs, which predicts that the functions of CtbZIP family vary with variation in their expression. We quantified the expressions of all 52 CtbZIP TFs in different tissues and seeds (of various developmental stages). The network is composed of 5 clusters as shown in Fig. 7. There is a certain degree of related expression trend between these clusters and this tendency was statistically significant. These results indicate that although the functions and expressions of CtbZIP family members have dramatically diverged, they retain to some extent, the tendency of correlated expression and functional cooperation. To verify the transcript abundance of CtbZIPs genes, we used RT-qPCR and evaluated their expression in root, stem, leaf, flower, seed, cotyledon and hypocotyl (Fig. 8). The results of RT-qPCR showed that the expression pattern of safflower is consistent with the predicted expression. According to this expression pattern, the function of CtbZIP TFs can be more effectively estimated. In the process of plant growth and development, light and hormone are the key factors that directly affect these two processes. At present, it has been confirmed that the A subfamily bZIP members of Arabidopsis thaliana are mainly involved in ABA signaling[39] whereas H and G subfamilies regulate photoresponse[14,40]. In rice, OsbZIP12 has been reported as a positive regulator of ABA signalling[41] while in Medicago esculenta, bZIP11, 27, 52 and 64 were upregulated at time points of ABA treatment[38]. In Ipomoea trifida, eight bZIP genes were upregulated at least in one tissue type as well as one time point, in response to ABA treatment[42]. AtbZIP16 has been reported to regulate early development of seedling by integrating hormone and light signalling pathways thereby promoting germination as well elongation of hypocotyl[43]. Under RL (Red Light) treatment, ClabZIP6 and ClabZIP56 were significantly induced while ClabZIP37 and ClabZIP22 were repressed in leaves of Citrullus lanatus[44]. Figure 9 depicts that changes in expression of CtbZIPs under GA3 and light reveal that some of CtbZIP genes might be directly or indirectly affected by light intensity and hormones. These results provide a basis for further exploration of the function of CtbZIP TFs. In summary, our study provides genome-wide analysis of the safflower bZIP family. We accurately screened 52 CtbZIP TFs, and divided them into 12 subclasses by identifying the conserved homology between them. Their basic physical and chemical properties were analyzed including ORF, number of amino acids and conserved structural positions. A total of 20 conserved structures are found in CtbZIP TFs family. All CtbZIP TFs contain a typical conserved bZIP_1 domain. For the enrichment analysis of the CtbZIP TFs, we found that 45 of the 52 CtbZIPs were enriched, and among the 45, none of the genes were individually enriched into a certain GO functional category. Six CtbZIP TFs were enriched in three major categories CC, BP and MF, and 39 CtbZIP TFs are enriched in BP and MF. A total of four clusters within the CtbZIP TFs were discovered, which constitute a complex interplay network. The expression patterns of the CtbZIP family were predicted and verified by heat map and qRT-PCR. This study improves our understanding of safflower bZIP TFs and lays the foundation of cultivating new cultivars of safflower through molecular breeding methods.

Methods

Plant materials and treatments

The JiHong No. 1 safflower seeds purchased from safflower edge Co. Ltd. in Xinjiang of China, were cultivated in experimental field of Jilin Agricultural University for multiplication. The collected seeds of safflower were germinated in soil and allowed to grow at 23 ± 2 °C in growth room. It takes about 7 days to sprout cotyledons and hypocotyls, flowers in approximately 100 days while seeds in about 135 days. For light treatment, some safflower plants were grown under normal light radiation (16.8 MJ/m2) while another set of plants under weak light radiation (5.04 MJ/m2). For GA3 treatment, the plants that grew after flowering were sprayed with 50 mg/L GA3 once daily for 5 days. Each experimental group was sprayed simultaneously at 10 am. We collected various tissues, such as leaf, stem, root, flower, cotyledon, hypocotyl and seeds, immediately froze in liquid nitrogen and stored at − 80 °C for further use.

Identification and characterization of CtbZIP TFs

The sequences of CtbZIP were obtained from the safflower genome database (Accession: PRJNA399628 ID: 399628). We downloaded HMM profile of bZIP_1 (PF00170) from Pfam database[28] (https://pfam.xfam.org/) and the similar sequence of bZIP_1 was searched using Hidden Markov Model (HMM) as the query (P < 0.001). To avoid missing possible bZIP members, NCBI BLAST was performed using the known Arabidopsis bZIP sequences (downloaded from the TAIR, https://www.arabidopsis.org/), as queries against the safflower genome database[26]. All of the possible bZIP TFs were screened according to the significant e-value < 1 × 10–5 in our data. In addition, the conserved bZIP domains were predicted using SMART[45] (https://smart.embl-heidelberg.de/) and Search Pfam[28] (https://pfam.xfam.org/search/sequence) in all of the possible bZIP TFs. Therefore, the high-confidence bZIP TFs were screened, which were named as CtbZIP. Afterwards, we analyzed the physical and chemical properties of the predicted high-confidence CtbZIP TFs by ProtParam online tool[46] (https://www.expasy.org/).

Phylogenetic analysis of the CtbZIP proteins

The bZIP protein sequences of Arabidopsis and Ricinus communis were downloaded from database of PlantTFDB (https://planttfdb.cbi.pku.edu.cn) and that of rice were downloaded from the Rice Genome Annotation Project[47] (https://rice.plantbiology.msu.edu/index.shtml). Multiple alignment of the full-length bZIP sequences of safflower, Arabidopsis, rice and Ricinus communis was executed using Clustal X 2.0 program[48] and saved in the Clustal X file format. Using MEGA 7.0 program[49], we constructed a cladogram tree with 1,000 bootstrap replications and Neighbor-joining algorithm. The phylogenetic tree was modified using the iTOL online software[50] (https://itol.embl.de/login.cgi).

Motifs analysis of CtbZIP proteins

We searched the open reading frames of CtbZIP genes through the ORF finder at NCBI (https://www.ncbi.nlm.nih.gov/gorf/gorf.html). CtbZIP transcripts were analyzed in the Pfam[28] (https://pfam.sanger.ac.uk/) protein database. Analysis of the conserved motifs in safflower CtbZIP TFs were further carried out by multiple EM for motif elicitation software (MEME[29]) (https://meme.sdsc.edu/meme/cgi-bin/meme.cgi) with default parameters. The maximum number of motifs was set to 20 and motif width to 6-50aa. Whereafter a conservative structure was generated using TBtools[30] (https://www.tbtools.com/). The related motif information used is listed in Table S2.

Gene ontology annotations of CtbZIP TFs

The functions of the CtbZIP TFs were categorized in silico using Blast2GO software[31] (https://www.blast2go.com/). The GO functional categorization of 52 CtbZIP TFs was used into each subcategory for enrichment analysis. The enrichment of the number of CtbZIP transcripts categorized into each subcategory was determined by Chi-square test.

Network analysis of the CtbZIP TFs

The construction of the co-expression network is conceptually simple and intuitive. Through the similarity of gene expression, the possible interactions of gene products can be analyzed to understand the intergenic interaction. The various traits are the result of genetic interactions. In order to excavate the network of interactions during CtbZIP genes family, we used the R programming language and software[51] to calculate Pearson correlation coefficient. A gene co-expression network was constructed using BioLayout Express 3D Version 3.2 software[32].

Gene expression patterns analysis

To investigate the CtbZIP gene family expression patterns, the high-throughput safflower transcriptome sequencing data were used to analyze the CtbZIP gene expression patterns in various tissues for roots, stems, leaves, flowers and DAF10, 13 and 20 seeds. The expression estimations of CtbZIP genes were normalized and represented in the form of RPKM (reads per kilo base per million mapped reads), and fold change (log2) values were calculated through the ratio of gene expression to draw heatmaps with R[51] and TBtools[30] software.

RNA extraction and cDNA synthesis

The experimental materials (various tissues: root, stem, leaf, flower, seed, cotyledon, hypocotyl) were pulverized adequately and put into centrifuge tubes. Total RNA of various tissues was isolated using Trizol (Invitrogen, Carlsbad, CA, USA), according to the instructions of the manufacturer. The extracted total RNA was treated with RNase-free DNase (Promega, USA) to remove the genomic DNA contamination. RNA quality was checked on OD260/280 values by Nano Drop 2000 (ThermoFisher Scientific, Beijing, China) and 1.2% agarose gel electrophoresis. The cDNA was synthesized from total RNA isolated from various tissues using the PrimeScript RT reagent kit with gDNA Eraser (Takara, Japan), according to the manufacturer’s protocols. First, 2 μL 5 × DNA Eraser buffer, 1 μL gDNA Eraser, 2 μL total RNA (about 1,000 ng) and 5 μL RNase free ddH2O were mixed in tube and incubated at 42 °C for 2 min to remove DNA. The purified RNA was reverse-transcribed into cDNA by adding 4 μL 5 × PrimeScript buffer, 1 μL PrimeScript enzyme mix I, 1 μL RT primer mix and 4 μL RNase free ddH2O into the above-mentioned reaction and incubated at 37 °C for 15 min followed by 85 °C for 15 s. The cDNA was stored at − 20 °C.

Real-time fluorogenic quantitative PCR

Real-time fluorogenic quantitative PCR (RT-qPCR) was performed using SYBR Premix Ex Taq II kit (Takara, Japan) and Stratagene Mx3000P thermocycler (Agilent) to monitor DNA products. The most stable housekeeping reference gene (EF1α) was selected for the expression analysis in various tissues. The relative expression of CtbZIP was normalized to the expression of EF1α and expressed relative to the level in various treatment. Gene-specific primers designed for the CtbZIP genes are listed in Table S3. RT-qPCR amplification was performed in 15 μL reaction volume containing 500 ng template cDNA (1 μL), 0.3 μL primer (10 m), 7.5 μL SYBR Premix Ex Taq (2×), 0.3 μL ROX Reference Dye (10 m), and 5.6 μL DEPC ddH2O. RT-qPCR profile was set as an initial denaturation at 95 °C for 5 min, followed by 40 cycles of 95 °C for 5 s and annealing at 60 °C for 30 s. The fold change in relative expression level was calculated using method.

Statistical analysis

The experiment was designated for three random replications. All data were analyzed by one-way analysis of variance (ANOVA) and all means were separated at the P < 0.05 level. The different tissues and GA3 treatment by the biological significance of the differential expression were analyzed. Supplementary Information.
  46 in total

1.  ABFs, a family of ABA-responsive element binding factors.

Authors:  H Choi; J Hong; J Ha; J Kang; S Y Kim
Journal:  J Biol Chem       Date:  2000-01-21       Impact factor: 5.157

2.  Network visualization and analysis of gene expression data using BioLayout Express(3D).

Authors:  Athanasios Theocharidis; Stjin van Dongen; Anton J Enright; Tom C Freeman
Journal:  Nat Protoc       Date:  2009-10-01       Impact factor: 13.491

3.  Analysis of transcription factor HY5 genomic binding sites revealed its hierarchical role in light regulation of development.

Authors:  Jungeun Lee; Kun He; Viktor Stolc; Horim Lee; Pablo Figueroa; Ying Gao; Waraporn Tongprasit; Hongyu Zhao; Ilha Lee; Xing Wang Deng
Journal:  Plant Cell       Date:  2007-03-02       Impact factor: 11.277

4.  Genomic survey and gene expression analysis of the basic leucine zipper transcription factor family in rice.

Authors:  Aashima Nijhawan; Mukesh Jain; Akhilesh K Tyagi; Jitendra P Khurana
Journal:  Plant Physiol       Date:  2007-12-07       Impact factor: 8.340

5.  Soybean GmbZIP44, GmbZIP62 and GmbZIP78 genes function as negative regulator of ABA signaling and confer salt and freezing tolerance in transgenic Arabidopsis.

Authors:  Yong Liao; Hong-Feng Zou; Wei Wei; Yu-Jun Hao; Ai-Guo Tian; Jian Huang; Yun-Feng Liu; Jin-Song Zhang; Shou-Yi Chen
Journal:  Planta       Date:  2008-03-26       Impact factor: 4.116

6.  Arabidopsis bZIP16 transcription factor integrates light and hormone signaling pathways to regulate early seedling development.

Authors:  Wen-Ping Hsieh; Hsu-Liang Hsieh; Shu-Hsing Wu
Journal:  Plant Cell       Date:  2012-10-26       Impact factor: 11.277

7.  20 years of the SMART protein domain annotation resource.

Authors:  Ivica Letunic; Peer Bork
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

Review 8.  A Metabolic Perspective and Opportunities in Pharmacologically Important Safflower.

Authors:  Vimalraj Mani; Seon-Kyeong Lee; Yunsoo Yeo; Bum-Soo Hahn
Journal:  Metabolites       Date:  2020-06-17

9.  Genome-wide identification, structural and gene expression analysis of the bZIP transcription factor family in sweet potato wild relative Ipomoea trifida.

Authors:  Zhengmei Yang; Jian Sun; Yao Chen; Panpan Zhu; Lei Zhang; Shaoyuan Wu; Daifu Ma; Qinghe Cao; Zongyun Li; Tao Xu
Journal:  BMC Genet       Date:  2019-04-25       Impact factor: 2.797

10.  Genome-wide characterization and analysis of bZIP transcription factor gene family related to abiotic stress in cassava.

Authors:  Wei Hu; Hubiao Yang; Yan Yan; Yunxie Wei; Weiwei Tie; Zehong Ding; Jiao Zuo; Ming Peng; Kaimian Li
Journal:  Sci Rep       Date:  2016-03-07       Impact factor: 4.379

View more
  5 in total

1.  Identification and functional characterization of safflower cysteine protease 1 as negative regulator in response to low-temperature stress in transgenic Arabidopsis.

Authors:  Yanxi Lv; Jianyi Zhang; Naveed Ahmad; Hong Yingqi; Youbao Li; Nan Wang; Liu Xiuming; Yao Na; Xiaokun Li
Journal:  Planta       Date:  2022-04-21       Impact factor: 4.116

2.  The bZIP Transcription Factor Family in Adzuki Bean (Vigna Angularis): Genome-Wide Identification, Evolution, and Expression Under Abiotic Stress During the Bud Stage.

Authors:  Zhengong Yin; Xianxin Meng; Yifan Guo; Shuhong Wei; Yongcai Lai; Qiang Wang
Journal:  Front Genet       Date:  2022-04-25       Impact factor: 4.772

3.  Genome-wide identification and expression analysis of bZIP transcription factors in oil palm (Elaeis guineensis Jacq.) under abiotic stress.

Authors:  Lixia Zhou; Rajesh Yarra
Journal:  Protoplasma       Date:  2021-07-01       Impact factor: 3.356

4.  Genome-wide identification and characterization of bZIP gene family and cloning of candidate genes for anthocyanin biosynthesis in pomegranate (Punica granatum).

Authors:  Sha Wang; Xinhui Zhang; Bianbian Li; Xueqing Zhao; Yu Shen; Zhaohe Yuan
Journal:  BMC Plant Biol       Date:  2022-04-04       Impact factor: 4.215

5.  Genome-wide identification and expression analysis of the bZIP transcription factor family genes in response to abiotic stress in Nicotiana tabacum L.

Authors:  Lili Duan; Zejun Mo; Yue Fan; Kuiyin Li; Mingfang Yang; Dongcheng Li; Yuzhou Ke; Qian Zhang; Feiyan Wang; Yu Fan; Renxiang Liu
Journal:  BMC Genomics       Date:  2022-04-22       Impact factor: 4.547

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.