Literature DB >> 34828404

Genome-Wide Identification and Analysis of the MADS-Box Gene Family in Theobroma cacao.

Qianqian Zhang1, Sijia Hou1, Zhenmei Sun2, Jing Chen1, Jianqiao Meng1, Dan Liang1, Rongling Wu1, Yunqian Guo1.   

Abstract

The MADS-box family gene is a class of transcription factors that have been extensively studied and involved in several plant growth and development processes, especially in floral organ specificity, flowering time and initiation and fruit development. In this study, we identified 69 candidate MADS-box genes and clustered these genes into five subgroups (Mα: 11; Mβ: 2; Mγ: 14; Mδ: 9; MIKC: 32) based on their phylogenetical relationships with Arabidopsis. Most TcMADS genes within the same subgroup showed a similar gene structure and highly conserved motifs. Chromosomal distribution analysis revealed that all the TcMADS genes were evenly distributed in 10 chromosomes. Additionally, the cis-acting elements of promoter, physicochemical properties and subcellular localization were also analyzed. This study provides a comprehensive analysis of MADS-box genes in Theobroma cacao and lays the foundation for further functional research.

Entities:  

Keywords:  MADS-box transcription factors; Theobroma cacao; bioinformatics analysis; gene family; genome-wide characterization

Mesh:

Substances:

Year:  2021        PMID: 34828404      PMCID: PMC8622960          DOI: 10.3390/genes12111799

Source DB:  PubMed          Journal:  Genes (Basel)        ISSN: 2073-4425            Impact factor:   4.096


1. Introduction

MADS-box genes encode eukaryotic transcription factors that play a prominent role in plant development processes. MADS-box proteins contain a highly conserved DNA-binding MADS-domain of approximately 50–60 amino acids in length in their N-terminal region, and this domain could be involved in recognizing and binding the CArG motif of their target gene [1]. The name itself is given by the initials of the four first-discovered transcription factors in this family, which are MCMI in Saccharomyces cerevisiae [2], AGAMOUS in Arabidopsis thaliana [3], DEFICENS in Antirrhinum majus [4] and SRF4 in Homo sapiens [5]. Based on protein domain structure, the MADS-box genes are divided into two categories: type I and type II. The type I MADS-box genes can be further classified into Mα, Mβ, Mγ, Mδ subclasses. Type II lineage, also known as MIKC type, has a special MIKC structure, which is composed of an N-terminal MADS domain, the I (intervening) and K (keratin-like) regions and a variable C-terminal transcriptional activation domain [6]. Type MIKC were further divided into two subgroups, MIKCC and MIKC*, according to their MIKC structural features [7]. The MADS-box gene family is known to have functions in many significant physiological and developmental processes, such as the regulation of floral organ specificity [3,4], control of flowering signals and initiation [8,9], fruit development [10], meristem identify specification [11], and seed development [12]. For example, Wheat VERNALIZATION1 (VRN1) is a key regulator of flowering time and floral meristem determination [13] The MADS-box gene FLOWERING LOCUS C (FLC) controls the vernalization pathway in Arabidopsis [14]. Apple MdDAM1 plays a role in bud dormancy and growth cessation in autumn [15]. Although MADS-box genes are well-known for their roles in the flower developmental process and participating in the classical ABC flower development model, some of them have been validated to function on root and leaf morphogenesis [16,17]. To date, the MADS-box proteins have been characterized in various kinds of plants, including Arabidopsis [18], Populus trichocarpa [19], pineapple [20], Saccharum spontaneum [21], Erigeron breviscapus [22]. However, little is known regarding the MADS-box gene family in Theobroma cacao. Theobroma cacao is an economically important tropical tree, native to South America, which is planted in large quantities for its fruits (cacao pods), where its beans were used as the raw material for making chocolate, coco butter, cosmetics and confectionery [23]. Additionally, some studies have proposed that an ingredient found in coco might exert cardiovascular benefits [24]. Research into the sequencing and assembling genome of Theobroma cacao was carried out in 2010 [25], leading to the genome-wide identification and analysis of important gene families such as the NAC domain transcription factor family [26], WRKY transcription factor family [27], and GPX family [28]. The metabolome and transcriptome profiling of the Theobroma cacao pods was completed [29]. In this scenario, we conducted a bioinformatics analysis of MADS-box members of Theobroma cacao at the gene level. We identified 69 MADS-box gene members, investigated their phylogenetic relationship, classified them, and analyzed gene structures, motifs, and chromosome location. Moreover, subcellular localization and cis-acting elements were also performed. Our results may provide a basis for further functional studies of coco tree genes and references for subsequent research into molecular mechanisms.

2. Materials and Methods

2.1. Identification of MADS-Box Genes in Theobroma cacao

Theobroma cacao genome sequences and annotation files were provided by Ensembl Plants (http://plants.ensembl.org/index.html, accessed on 16 April 2021. The hidden Markov model (HMM) profile of the MADS-domain was retrieved from the Pfam database (release 34.0; http://pfam.xfam.org/, accessed on 16 April 2021) with the accession number ‘PF00319’ [30]. MADS-box proteins in Theobroma cacao were searched using the following two approaches. First, the downloaded HMM profile was employed using the HMMER v3.3.2 program to search proteins containing the MADS-domain. Secondly, to avoid missing candidates, we constructed a new HMM model with proteins with e-value < 1 × 10−20, and ClustalW (version 2.1) was used for multiple sequence alignments [31]. The new model was used to search all Theobroma cacao protein sequences using HMMER (version 3.3.2), with a cut-off e-value of 0.05. Additionally, the predicted proteins were invalidated by conducting protein domain searches on the SMART program (http://smart.embl-heidelberg.de/, accessed on 19 June 2021) and NCBI Conserved Domain Search (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi, accessed on 19 June 2021) to confirm the presence of the MADS-domain in all candidate proteins.

2.2. Phylogenetic Analysis and Classification of MADS-Box Genes

To understand the phylogenetic relationship and to classify the MADS-box genes, a rooted neighbor-joining (NJ) phylogenetic tree for Theobroma cacao (TcMADS) and Arabidopsis MADS-box proteins was constructed using MEGA X software (version 10.2.2) [32]. The TcMADS gene family was classified according to their phylogenetic relations with corresponding Arabidopsis MADS-box members. Arabidopsis MADS-box protein sequences were downloaded from TAIR (https://www.arabidopsis.org/, accessed on 24 July 2021) with the accession numbers reported by Parenicová et al [18]. All protein sequences were aligned by Muscle with the default parameters [33]. The Neighbor-Joining method was used, with the following parameters: 1000 replications for bootstrap method, Poission model, Pairwise deletion. Additionally, an individual phylogenetic tree of TcMADS genes was built with the same method and beautified by ggtree [34].

2.3. Conserved Motif and Gene Structure Analysis

Online program MEME (https://meme-suite.org/meme/tools/meme, accessed on 29 July 2021) was applied to analyze the conserved motifs in the MADS-box protein with the following settings: maximum number of motifs 10, minimum motif width 6, maximum motif width 50, number of repetitions any [35]. The intron–exon structure information was contained in the Theobroma cacao gtf file downloaded from Ensembl Plants. Conserved motif and gene structure were both visualized by TBtools software (version 0.665). The online tools ProtParam (https://web.expasy.org/protparam/, accessed on 10 August 2021) and Compute pI/Mw (https://web.expasy.org/compute_pi/, accessed on 10 August 2021) was employed to analyze physicochemical properties including theoretical isoelectric points (PI), average molecular weight (MW), instability index and aliphatic index. Number of amino acids (aa) and open reading frame (ORF) lengths were both found with the ORFfinder website (https://www.ncbi.nlm.nih.gov/orffinder/, accessed on 11 August 2021). The BUSCA program (https://busca.biocomp.unibo.it/, accessed on 4 August 2021) was used to predict TcMADS proteins’ subcellular localization (SL).

2.4. Chromosomal Localization and Gene Duplication

The locational information on the chromosomes and chromosome length of TcMADS genes was acquired from Ensembl Plant. All identified genes were mapped to 10 chromosomes with MG2C (http://mg2c.iask.in/mg2c_v2.1/, accessed on 25 June 2021) according to their chromosomal positions and relative distance. TcMADS gene potential duplication was confirmed based on major criteria as follows: (a) sequence alignment length cover > 75% of longer sequence, and (b) the similarity of the aligned region > 75% [36]. Bio-Linux was used to screened tandem repeat sequences. The TcMADS protein sequences were aligned by MAFFT (version 7.481), and then multiple protein alignment were confirmed and the corresponding DNA sequences were sorted into codon alignments [37], which were used to calculate the Ka/Ks ratios using KaKs calculator Toolbox 2.0 (version 2.0).

2.5. Analysis of Cis-Acting Element in MADS-Box Genes’ Promoters

The upstream sequences (2 kb) of TcMADS genes’ CDS were retrieved from the Theobroma cacao genome by TBtools software according to gene ID, and then submitted to PlantCARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/, accessed on 5 August 2021) to identify four cis-acting elements, including light-responsive elements, wound-responsive elements, gibberellin-responsive elements, and auxin-responsive elements, after filtering and screening. The variety and quantity of cis-acting elements upstream each gene was found with TBtools.

3. Results

3.1. Identification of MADS-Box Genes in Theobroma cacao

To identify the MADS-box genes in Theobroma cacao, two HMM analyses were performed: after removing duplicates, a total of 68 putative MADS proteins were obtained by first HMMER searches, using the MADS domain profile as a query, in the coco tree protein database. For the second HMM analysis, we selected proteins which e-value > 0.05 as candidate members, choosing the longest transcript for each screened gene, and thus generating 69 MADS-box genes after confirming MADS domain by SMART and NCBI Conserved Domain Search Service (Supplementary File S1). These 69 MADS-box genes were sequentially renamed from TcMADS1 to TcMADS69 based on their chromosomal location and subjected to further analyses. Detailed characteristics, including number of amino acids (aa), average molecular weight (MW), theoretical pI, instability index, and aliphatic index about TcMADS genes, are listed in Table 1. The statistical results showed that the protein length varied, ranging from 78 (TcMADS23) to 600 (TcMADS7) amino acids, with an average length of amino acids, and the molecular weights varied from 66752.75 Da (TcMADS23) to 8995.45 Da (TcMADS7). Additionally, thirteen MADS-box proteins were acidic, with pI values less than 6.5; 52 were alkaline, with pI values greater than 7.5; four were neutral, with a pI are between 6.5 and 7.5. The instability index analysis indicated that most of the TcMADS proteins were unstable, with an instability index greater than 40, except for TcMADS12, TcMADS37, TcMADS57, TcMADS67, TcMADS1, TcMADS55, TcMADS9, TcMADS47. The subcellular localization prediction of TcMADS genes was analyzed by BUSCA tools. From the analysis results, most TcMADS genes appeared to mainly be located in the nucleus (63.77%) and chloroplast (34.78%), with only TcMADS11 found in the endomembrane system.
Table 1

Detailed information regarding MADS-box gene family in Theobroma cacao.

Gene NameGene IDPhysicochemical CharacteristicsSLORF
PIMW (Da)Length (aa)Instability IndexAliphatic Index
TcMADS1 TCM_0002399.4846,436.940639.0977.32nucleus1221
TcMADS2 TCM_0002669.5926,095.822446.5287.1chloroplast675
TcMADS3 TCM_0007259.9136,097.6831558.2383.87nucleus948
TcMADS4 TCM_0008786.8526,140.8923741.1178.23chloroplast714
TcMADS5 TCM_0009319.1229,034.1425045.3384.56chloroplast753
TcMADS6 TCM_0009926.5525,353.7922453.3285.76nucleus675
TcMADS7 TCM_0011815.8666,752.7560048.7677.52nucleus1803
TcMADS8 TCM_0011825.4337,831.3833762.273.77nucleus1014
TcMADS9 TCM_0013359.1938,279.1533838.4968.11nucleus1017
TcMADS10 TCM_0018419.8531,109.1826962.986.17chloroplast810
TcMADS11 TCM_0054568.9130,163.0826253.6384.89endomembrane system789
TcMADS12 TCM_0054589.0827,810.8424338.4982.3nucleus732
TcMADS13 TCM_0058188.5142,280.2737542.75101.09nucleus1128
TcMADS14 TCM_0063239.4228,732.0225452.2262.24nucleus765
TcMADS15 TCM_0063249.1527,699.7724343.9785.56nucleus732
TcMADS16 TCM_0063255.4224,228.1221845.8955.96nucleus657
TcMADS17 TCM_0073249.5220,330.4517448.32100.29chloroplast525
TcMADS18 TCM_0073788.9624,754.1221951.1385.11chloroplast660
TcMADS19 TCM_0077137.7427,574.323366.5480.73chloroplast702
TcMADS20 TCM_0077879.1236,022.5731046.3192.13nucleus933
TcMADS21 TCM_0087038.529,428.2925846.9982.79nucleus777
TcMADS22 TCM_0087165.9222,921.2119956.2689.65nucleus600
TcMADS23 TCM_0089739.398995.457851.8996.15chloroplast237
TcMADS24 TCM_0114758.9728,016.0824157.3680.17nucleus726
TcMADS25 TCM_0114786.6127,447.9624058.3873.62nucleus723
TcMADS26 TCM_0116876.3339,385.4835147.7378.66nucleus1056
TcMADS27 TCM_0124896.8523,710.1521040.5271.57nucleus633
TcMADS28 TCM_0140516.1341,766.3937051.4677.22chloroplast1113
TcMADS29 TCM_0143378.7946,247.6140752.1488.87nucleus1224
TcMADS30 TCM_0143459.0627,737.4723950.1776.78nucleus720
TcMADS31 TCM_0146619.8324,429.0721055.0786.33chloroplast633
TcMADS32 TCM_0150448.8227,249.0223653.7686.78nucleus711
TcMADS33 TCM_0150499.8827,106.3823747.2390.13chloroplast714
TcMADS34 TCM_0156745.4727,657.4523869.787.65nucleus717
TcMADS35 TCM_0161479.2424,830.3921563.3173.95nucleus648
TcMADS36 TCM_0172428.5124,320.8220947.0886.75nucleus630
TcMADS37 TCM_0189799.0727,740.6324334.0685.14nucleus732
TcMADS38 TCM_0189818.7728,366.1424862.8680.24nucleus747
TcMADS39 TCM_0193628.2330,811.5826758.2470.15nucleus804
TcMADS40 TCM_0210509.717,902.0715545.2989.94chloroplast468
TcMADS41 TCM_0229939.5140,637.6335653.1676.71chloroplast1071
TcMADS42 TCM_0230069.238,815.1935458.3670.54nucleus1065
TcMADS43 TCM_0230418.9338,451.3735459.0169.49nucleus1065
TcMADS44 TCM_0245796.0428,901.425261.3985.48nucleus759
TcMADS45 TCM_0256708.8626,594.6523364.6674.12chloroplast702
TcMADS46 TCM_0256719.3726,384.4323363.0270.39nucleus702
TcMADS47 TCM_0256749.6416,948.7315036.0379.4nucleus453
TcMADS48 TCM_02567610.2911,744.910356.2668.25chloroplast312
TcMADS49 TCM_0268429.2630,49927346.0671.87nucleus822
TcMADS50 TCM_0268459.4723,787.5220749.1578.74chloroplast624
TcMADS51 TCM_0292344.9119,812.0517442.7176.21nucleus525
TcMADS52 TCM_0295189.5220,156.218244.4579.84chloroplast549
TcMADS53 TCM_0295199.6449,499.2443750.1568.56nucleus1314
TcMADS54 TCM_0295969.6834,012.7629466.0374.05nucleus885
TcMADS55 TCM_0324029.2513,764.4513234.3860.68nucleus399
TcMADS56 TCM_0324037.7419,620.3417250.9874.24chloroplast519
TcMADS57 TCM_0341489.0826,051.722533.2188.36chloroplast678
TcMADS58 TCM_0345017.6425,504.1622755.8191.06nucleus684
TcMADS59 TCM_0345499.6221,512.8418450.388.42chloroplast555
TcMADS60 TCM_0347575.4537,593.5433359.5284.83nucleus1002
TcMADS61 TCM_0349705.2638,476.1633760.1882.43nucleus1014
TcMADS62 TCM_0352128.8724,432.8720961.7578.42nucleus630
TcMADS63 TCM_0364739.3418,742.0416241.1999.38chloroplast489
TcMADS64 TCM_0365419.4325,550.2422261.6191.4chloroplast669
TcMADS65 TCM_0365689.5223,170.7820356.6287nucleus612
TcMADS66 TCM_0373949.6221,551.0118642.992.31chloroplast561
TcMADS67 TCM_0407359.7624,088.721439.1887.01chloroplast645
TcMADS68 TCM_0427994.8426,046.3923357.568.28nucleus702
TcMADS69 TCM_0428484.8126,121.6923355.4175.41nucleus702

3.2. Phylogenetic Analysis and Classification of the MADS-Box Gene

To understand the phylogenetic relationship among MADS-box genes in the coco tree and group them into the established subfamilies, we employed MEGA X to construct a rooted neighbor-joining phylogenetic tree based on the amino acid sequence alignment of 69 proteins from Theobroma cacao and 96 from Arabidopsis (Figure 1) [15], which also allowed for inferences to be made about the possible function of these genes based on Arabidopsis gene function research. According to the general MADS-box gene classification in Arabidopsis, the TcMADS genes were grouped into two types: type I and type II. Then, based on the phylogenetic relationships, the type I MADS-box genes were further subdivided into more detailed subfamilies: Mα (11), Mβ (2), Mγ (14), Mδ (9). The Mβ group has the minimum number of members, 2, while the corresponding group members in Arabidopsis contains 16, which indicates that genes were lost over the development of evolution. and the remaining 32 members were classified as MIKC type II. It is notable that TcMADS39 is not classified into any of these subfamilies; therefore, we group it as UN.
Figure 1

Phylogenetic tree of MADS-box genes in Arabidopsis and Theobroma cacao. The MADS-box genes are indicated with light pink and light green shade for Arabidopsis and Theobroma cacao, respectively. In second (narrow) ring from outside, the size of the area represented by the two colors shows the proportion of genes from two species in each group. The subgroups are marked by colorful background and circles.

3.3. Conserved Motif and Structure Analysis

To gain insights into the structural diversity and similarity of MADS-box genes in coco tree, we analyzed the intron–exon arrangements and conserved motifs according to their phylogenetic relations. As shown in Figure 2A, we first constructed an individual phylogenetic tree using an NJ method similar to that of the species tree described above, and then mapped their intron–exon structure (Figure 2B). A very striking distribution of introns in the Arabidopsis MADS-box genes was previously reported: the MICK subfamily of TcMADS genes contained multiple introns, as did the Mδ group, whereas the remaining three subfamilies (Mα, Mβ and Mγ) usually had no introns, or only one or two introns. The reason the Mα, Mβ and Mγ groups contain fewer introns might be a differential tendency to lose or acquire introns or a reverse-transcribed origin for the ancestors of the three subfamilies [18]. In our study, the number of introns in TcMADS genes ranged from one (TcMADS14, TcMADS15, TcMADS4, TcMADS63, TcMADS47, TcMADS49) to eighteen (TcMADS7). Furthermore, closely related genes have a similar gene structure, differing only in the length of exons and introns. The shortest TcMADS gene was just 237 bp in length (TcMADS23), while the longest gene was TcMADS7, with a length of 1803 bp.
Figure 2

Phylogenetic relationship, gene structure and conserved motifs of the TcMADS genes. (A) An unrooted NJ tree (left side of the figure) obtained using the MEGA X based on coco tree MADS-box protein sequences. (B) The exon–intro structures of Theobroma cacao MADS-box genes (central of the figure) were displayed by TBtools software. (C) Conserved motif composition of the TcMADS proteins (right side). Detailed information on the ten motifs is provided in Supplementary Table S1.

To further study the characteristics of the MADS-box gene family and the conserved motifs that are shared among different subfamilies in Theobroma cacao, Multiple Expectation Maximization for Motif Elicitation program was used to identify the conserved motifs. A total of 10 conservative motifs were predicted and named from Motif 1 to Motif 10 (Figure 2C). Among these motifs, Motif 1 was prevalent in all genes; it is worth noting that there were only two Motif 1s in TcMADS53. Motif 2 was also present in almost TcMADS genes. Motif 3, Motif 8, Motif 5, Motif 9 and Motif 10 were only observed in the Mγ subfamily, which indicated that they might be unique to the Mγ group. Generally, TcMADS genes of the same subfamily had similar motifs; we speculated that they might have a similar biological function.

3.4. Genome Distribution and Gene Evolution Analysis of TcMADS Genes

According to the location information acquired from genome annotation file downloaded in Ensembl Plants database, 69 TcMADS genes were evenly distributed on 10 chromosomals (Figure 3A) and renamed based on their position on the chromosome. A higher abundance of MADS-box genes (18.84%) of coco tree was observed on chromosome (Chr) I and II, whereas ChrVII, ChrXI, ChrX had only two MADS-box genes (2.90%). As shown in Figure 3B, a chromosomal bias was observed in the distribution of Mγ subfamily, which was mainly confined to ChrV. ChrIII and ChrVIII were both contained nine TcMADS genes. The other MADS-box genes of Theobroma cacao were located as follows: 5, 10 and 4 on ChrIV, ChrV and ChrVI, respectively.
Figure 3

(A) Physical distribution of TcMADS genes among 10 chromosomes. (B) Number of TcMADS subfamily on each chromosome.

Some of the MADS-box genes distribution showed a relatively high density on chromosomes. We screened tandem duplicated gene pairs among sixty-nine TcMADS genes. The analysis showed that three genes (TcMADS41, TcMADS42, TcMADS43) on ChrV are duplications of each other, and two gene pairs were also found on ChrV (TcMADS49&TcMADS50, TcMADS45&TcMADS46) and one pair on ChrII (TcMADS68&TcMADS69). Additionally, the substitution ratio of non-synonymous (Ka) to synonymous (Ks) mutations (Ka/Ks) of above six pairs were calculated. As shown in Table 2, Ka/Ks values of TcMADS43&TcMADS42 and TcMADS43&TcMADS41 > 1, which means that these genes were positively selected over the course of evolution and the new protein functions could be beneficial to the survival and reproduction of the coco tree. The remaining four gene pairs had Ka/Ks < 1, indicating that these duplicated gene pairs evolved under purifying selection.
Table 2

Tandem duplicated gene pairs and their Ka, Ks, Ka/Ks values.

Tandem Duplicated Gene PairsChromosomeKaKsKa/Ks
TcMADS43&TcMADS42ChrV0.1083010.1037581.04379
TcMADS43&TcMADS41ChrV0.2132540.1704081.25143
TcMADS42&TcMADS41ChrV0.2158780.2365280.912695
TcMADS49&TcMADS50ChrV0.0989150.2378760.415826
TcMADS45&TcMADS46ChrV0.0780580.1113670.700902
TcMADS68&TcMADS69ChrII0.0354490.0864530.410042

3.5. Analysis of Putative Promoter Regions in TcMADS Genes

The cis-regulatory elements serve as a molecular switch by binding to transcription factors, which are associated with gene transcription initiation and transcription activity. To explore the putative functions of TcMADS genes, we extracted and examined the 2k bp sequences upstream the transcription start site. Four types of cis-acting elements were present in the promoter regions when submitted to PlantCARE Online program, including a light-responsive element, wound-responsive element, gibberellin-responsive element and auxin-responsive element. These were identified in our study, indicating that TcMADS genes are closely related to abiotic stress response. The distribution of these cis-acting elements on the promoters is shown in Supplementary Figure S1. Light-responsive elements were present in almost all promoter regions of MADS-box genes, with an especially large number in TcMADS28, TcMADS46, TcMADS62, TcMADS65.

4. Discussion

MADS-box proteins are major transcription factors involved in almost every biological process, and a surprising number of them have been systematically identified and analyzed in a variety of species. Although many MADS-box genes have been shown to have conserved functions in flower development and fruit ripening [38,39,40,41], some MADS-box genes have acquired novel functions in specific species during evolution [42]. To date, no detailed analysis of MADS-box genes has been performed in Theobroma coco. A better understanding of this family in terms of their member feature, structure characteristics can provide new ideas for further functional analysis. Compared with previous studies, the number of this family member varies in different species, with 107 in Arabidopsis [18], 105 in populus trichocarpa [19], 48 in pineapple [20], 182 in Saccharum spontaneum [21], 44 in Erigeron breviscapus [19,22], 64 in Salix suchowensis [43]. A total of 69 MADS-box proteins were identified from the coco tree in this study (Table 1), which is less than that in Arabidopsis. One possible explanation for this is that MADS-box genes coco tree may have a higher gene loss rate compared to that of Arabidopsis, indicating an important role of gene duplication over the course of evolution in various species [44]. These 69 TcMADS genes were renamed (TcMADS1-TcMADS69) based on their chromosomal location and further classified two types according to their phylogenetic relationship with Arabidopsis: type I including subclass Mα (11 genes), Mβ (2 genes), Mγ (14 genes), Mδ (9 genes) and type II MIKC (32 genes). The remaining TcMADS gene was classified as group UN. We found that most MADS-box genes belong to the MIKC subfamily, and Theobroma coco had a comparable number of Mδ and Mγ genes but fewer Mα, Mβ and MIKC genes than Arabidopsis, meaning that Arabidopsis may undergo more gene duplication events than Theobroma cacao. The structures of two types of MADS-box genes were obviously different, and MIKC subgroup genes were more conservative compared with other groups. Additionally, Type II genes usually have multiple introns, whereas most Mα, Mβ and Mγ members have fewer or no introns, indicating that these genes may experience more intron loss during gene family diversification. Previous studies proposed that the number of gene introns correlates with the expression level: the fewer the introns, the higher the pression [45,46]. The same pattern of intron–exon structures in type I and type II exists among diverse species including watermelon [47], Brachypodium distachyon [48], rice [49], and lettuce [50]. Overall, genes within the same group are structurally different from other genes; therefore, we speculated that there may be a complicated gene structural evolution in TcMADS genes. Phylogenomic analyses shows that gene and genome duplication events usually contribute to the diversification of the MADS-box transcription factor and play significant roles in shaping the regulatory networks involved in key phenotypic characters [51]. In this study, six tandem-duplicated gene pairs were identified, which all belong to the Mδ subfamily. As ubiquitous genetic components, promoters drive gene transcription and precisely, temporarily and spatially control gene in response to developmental and environmental signals [52]. The cis-acting elements located upstream of the transcription start sites play a vital biological role in regulating gene expression during growth and development [53]. A promoter analysis indicated that TcMADS genes are involved in diverse stress and hormone responses, making it possible to study individual gene function.

5. Conclusions

In this study, a systematic analysis was conducted of the Theobroma MADS-box gene family. Based on the Theobroma cacao genome data, we used HMM profiles to identify 69 MADS-box genes. Sixty-nine MADS-box genes were distributed across 10 chromosomes and phylogenetically classified into six subfamilies, which showed high similarity in terms of gene structure and conserved motifs within the same subfamily. Furthermore, cis-acting elements analysis show that TcMADS genes may be involved in diverse stress responses. In summary, these results provided more information about MADS-box genes and establish a foundation for future study of MADS-box genes in Theobroma cacao.
  53 in total

1.  The genome of Theobroma cacao.

Authors:  Xavier Argout; Jerome Salse; Jean-Marc Aury; Mark J Guiltinan; Gaetan Droc; Jerome Gouzy; Mathilde Allegre; Cristian Chaparro; Thierry Legavre; Siela N Maximova; Michael Abrouk; Florent Murat; Olivier Fouet; Julie Poulain; Manuel Ruiz; Yolande Roguet; Maguy Rodier-Goud; Jose Fernandes Barbosa-Neto; Francois Sabot; Dave Kudrna; Jetty Siva S Ammiraju; Stephan C Schuster; John E Carlson; Erika Sallet; Thomas Schiex; Anne Dievart; Melissa Kramer; Laura Gelley; Zi Shi; Aurélie Bérard; Christopher Viot; Michel Boccara; Ange Marie Risterucci; Valentin Guignon; Xavier Sabau; Michael J Axtell; Zhaorong Ma; Yufan Zhang; Spencer Brown; Mickael Bourge; Wolfgang Golser; Xiang Song; Didier Clement; Ronan Rivallan; Mathias Tahi; Joseph Moroh Akaza; Bertrand Pitollat; Karina Gramacho; Angélique D'Hont; Dominique Brunel; Diogenes Infante; Ismael Kebe; Pierre Costet; Rod Wing; W Richard McCombie; Emmanuel Guiderdoni; Francis Quetier; Olivier Panaud; Patrick Wincker; Stephanie Bocs; Claire Lanaud
Journal:  Nat Genet       Date:  2010-12-26       Impact factor: 38.330

Review 2.  MIKC-type MADS-domain proteins: structural modularity, protein interactions and network evolution in land plants.

Authors:  Kerstin Kaufmann; Rainer Melzer; Günter Theissen
Journal:  Gene       Date:  2005-02-22       Impact factor: 3.688

3.  Two ancient classes of MIKC-type MADS-box genes are present in the moss Physcomitrella patens.

Authors:  Katrin Henschel; Rumiko Kofuji; Mitsuyasu Hasebe; Heinz Saedler; Thomas Münster; Günter Theissen
Journal:  Mol Biol Evol       Date:  2002-06       Impact factor: 16.240

4.  The Polycomb-group protein MEDEA regulates seed development by controlling expression of the MADS-box gene PHERES1.

Authors:  Claudia Köhler; Lars Hennig; Charles Spillane; Stephane Pien; Wilhelm Gruissem; Ueli Grossniklaus
Journal:  Genes Dev       Date:  2003-06-15       Impact factor: 11.361

5.  Molecular and phylogenetic analyses of the complete MADS-box transcription factor family in Arabidopsis: new openings to the MADS world.

Authors:  Lucie Parenicová; Stefan de Folter; Martin Kieffer; David S Horner; Cristina Favalli; Jacqueline Busscher; Holly E Cook; Richard M Ingram; Martin M Kater; Brendan Davies; Gerco C Angenent; Lucia Colombo
Journal:  Plant Cell       Date:  2003-07       Impact factor: 11.277

6.  Effect of 5'UTR introns on gene expression in Arabidopsis thaliana.

Authors:  Betty Y W Chung; Cas Simons; Andrew E Firth; Chris M Brown; Roger P Hellens
Journal:  BMC Genomics       Date:  2006-05-19       Impact factor: 3.969

7.  Genome-wide identification and characterization of the MADS-box gene family in Salix suchowensis.

Authors:  Yanshu Qu; Changwei Bi; Bing He; Ning Ye; Tongming Yin; Li-An Xu
Journal:  PeerJ       Date:  2019-11-07       Impact factor: 2.984

8.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity.

Authors:  Robert C Edgar
Journal:  BMC Bioinformatics       Date:  2004-08-19       Impact factor: 3.169

9.  Genome-Wide Computational Identification of Biologically Significant Cis-Regulatory Elements and Associated Transcription Factors from Rice.

Authors:  Chai-Ling Ho; Matt Geisler
Journal:  Plants (Basel)       Date:  2019-10-23

10.  Genome-Wide Analysis of the NAC Domain Transcription Factor Gene Family in Theobroma cacao.

Authors:  Shiya Shen; Qianru Zhang; Yu Shi; Zhenmei Sun; Qianqian Zhang; Sijia Hou; Rongling Wu; Libo Jiang; Xiyang Zhao; Yunqian Guo
Journal:  Genes (Basel)       Date:  2019-12-28       Impact factor: 4.096

View more
  1 in total

1.  Genome-Wide Identification and Analysis of the R2R3-MYB Gene Family in Theobroma cacao.

Authors:  Junhong Du; Qianqian Zhang; Sijia Hou; Jing Chen; Jianqiao Meng; Cong Wang; Dan Liang; Rongling Wu; Yunqian Guo
Journal:  Genes (Basel)       Date:  2022-09-01       Impact factor: 4.141

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.