Literature DB >> 35591925

Evolution and functional diversification of R2R3-MYB transcription factors in plants.

Yun Wu1,2, Jing Wen3, Yiping Xia2, Liangsheng Zhang2, Hai Du3.   

Abstract

R2R3-MYB genes (R2R3-MYBs) form one of the largest transcription factor gene families in the plant kingdom, with substantial structural and functional diversity. However, the evolutionary processes leading to this amazing functional diversity have not yet been clearly established. Recently developed genomic and classical molecular technologies have provided detailed insights into the evolutionary relationships and functions of plant R2R3-MYBs. Here, we review recent genome-level and functional analyses of plant R2R3-MYBs, with an emphasis on their evolution and functional diversification. In land plants, this gene family underwent a large expansion by whole genome duplications and small-scale duplications. Along with this population explosion, a series of functionally conserved or lineage-specific subfamilies/groups arose with roles in three major plant-specific biological processes: development and cell differentiation, specialized metabolism, and biotic and abiotic stresses. The rapid expansion and functional diversification of plant R2R3-MYBs are highly consistent with the increasing complexity of angiosperms. In particular, recently derived R2R3-MYBs with three highly homologous intron patterns (a, b, and c) are disproportionately related to specialized metabolism and have become the predominant subfamilies in land plant genomes. The evolution of plant R2R3-MYBs is an active area of research, and further studies are expected to improve our understanding of the evolution and functional diversification of this gene family.
© The Author(s) 2022. Published by Oxford University Press on behalf of Nanjing Agricultural University.

Entities:  

Year:  2022        PMID: 35591925      PMCID: PMC9113232          DOI: 10.1093/hr/uhac058

Source DB:  PubMed          Journal:  Hortic Res        ISSN: 2052-7276            Impact factor:   7.291


Introduction

The V-myb avian myeloblastosis viral oncogene homolog (MYB) genes encode a group of pan-eukaryotic transcription factors (TFs) characterized by a highly conserved N-terminal DNA-binding domain repeat (R) plus a variable C-terminal regulatory region [1-6]. MYB genes constitute the second largest TF superfamily in flowering plants [7], accounting for ~13% of the 1500 TFs in the model plant Arabidopsis thaliana [8]. Conversely, only a few MYB genes are present in unikonts [9, 10]; for example, only three have been reported in the Homo sapiens genome, the human c-MYB proto-oncogene product (c-MYB) and two related vertebrate MYB factors [11, 12]. Each MYB repeat contains approximately 50 amino acid residues forming three α-helices and special DNA motifs, including MYB binding site I (MBSI), MBSII, or MBSIIG [12-15], with some exceptions [16]. According to sequence similarity, MYB domain repeats are classified as R1, R2, or R3 type [5, 17]. Typically, plant MYB superfamily members include 1–4 imperfect MYB repeat(s) and are therefore subdivided into four families: 1R-MYB (consisting of one or two separated repeats), 2R-MYB (R2R3-MYB, consisting of two adjacent repeats), 3R-MYB (consisting of three adjacent repeats), and 4R-MYB (consisting of four adjacent repeats) [5]. The R2R3-MYB family has expanded substantially in the plant lineage and is the predominant family [6, 18, 19]. Since the first plant R2R3-MYB gene, COLORED1 (C1), was identified in Zea mays and was demonstrated to be essential for anthocyanin biosynthesis in aleurone tissues [20], a tremendous number of R2R3-MYB genes (R2R3-MYBs) have been identified. In the past decade, based on substantial genome sequence data, the whole R2R3-MYB gene family has been comprehensively identified in numerous plant genomes (Supplemental Table 1), with counts ranging from dozens to hundreds, e.g. 126 genes in A. thaliana [9], 192 in Populus [21], 244 in soybean [22], and 406 in Gossypium hirsutum [23]. A comparative phylogenetic analysis of R2R3-MYBs revealed considerable diversification and conservation of this gene family in plants [10]. Concomitantly, the functions and characteristics of R2R3-MYB proteins have been studied extensively (Supplemental Table 2), including their roles in various plant-specific processes, such as responses to ambient stimuli [24-29], specialized metabolism [5, 20, 30–34], development [35-38], and cell differentiation [39-41]. Given their important roles in plants, the separate biological processes regulated by R2R3-MYBs have recently been reviewed in detail elsewhere [42-47]. A number of studies have shown that this gene family underwent a large expansion with functional diversification in land plants [6, 9, 10, 18, 19, 21, 49]. This may have contributed to the origin and diversity of this kind of plant, as it conferred on them the ability to adapt from aquatic to terrestrial environments during evolution [50]. To capture the most significant developments in this area, this review focuses on (a) the identification and classification of the R2R3-MYB gene family in plants, (b) the characterization of conserved and diverse functions in different species and biological processes, and (c) the use of comparative genomics to clarify the expansion and functional diversification of the R2R3-MYB family. Finally, we discuss new research directions to improve our understanding of R2R3-MYB evolution and functions.

Genome-wide characteristics and evolution of the R2R3-MYB gene family in plants

Identification and classification of R2R3-MYBs

Since its initial identification in 1982 [1], the R2R3-MYB family has been an important area of research. In the past 20 years, an increasing number of sequenced plant genomes have enabled integrative genome-wide overviews of this gene family in the plant kingdom. To date, the R2R3-MYB gene family has been evaluated in about 74 plant species, ranging from aquatic plants (such as Mesostigma viride [51]) to angiosperms (e.g. A. thaliana [9], Z. mays [49], Populus trichocarpa [21], Gossypium spp. [23], and Solanum tuberosum [52]) (Fig. 1 and Supplemental Table 1), and a pipeline for the automatic identification of MYBs has been created in order to make genome- or transcriptome-wide investigations more consistent [53]. These studies have revealed that R2R3-MYBs are widely distributed in the plant kingdom, with a sharp increase in number from aquatic to angiosperm plants, forming one of the largest and most diverse TF gene families (ranging from one to hundreds of members) in land plants, especially angiosperms (Fig. 1).
Figure 1

Phylogenetic relationships of 75 plant species in which the R2R3-MYB gene family has been comprehensively analyzed. Phylogenetic relationships among these species were obtained from the NCBI taxonomy database (https://www.ncbi.nlm.nih.gov/taxonomy). In total, 10 270 sequences of 75 plant species from Rhodophyta to angiosperms have been reported. Numbers in round brackets indicate the number of R2R3-MYB family members in each species. Detailed information, including corresponding references for each species, is provided in Supplemental Table 1. The end date for these statistics is November 2021.

Phylogenetic relationships of 75 plant species in which the R2R3-MYB gene family has been comprehensively analyzed. Phylogenetic relationships among these species were obtained from the NCBI taxonomy database (https://www.ncbi.nlm.nih.gov/taxonomy). In total, 10 270 sequences of 75 plant species from Rhodophyta to angiosperms have been reported. Numbers in round brackets indicate the number of R2R3-MYB family members in each species. Detailed information, including corresponding references for each species, is provided in Supplemental Table 1. The end date for these statistics is November 2021. The R2R3-MYB gene family clearly underwent functional diversification to form different subfamilies/groups during evolution. Accordingly, there are many classification schemes for plant R2R3-MYBs, and the differences primarily reflect the existence of lineage- or species-specific “orphan” genes [54, 55], differences in sampling coverage, and phylogenetic analysis methods [6]. On the whole, three landmark classification systems have been developed and improved. First, the canonical classification and nomenclature of this gene family were initially established in A. thaliana [9] and included 25 groups based on the MYB domain and the amino acid motifs in the C-terminal regions [5]. Thereafter, many studies have applied this system with varying degrees of modification [22, 49, 56–58]. Notably, in A. thaliana [9], around 30 of the 126 (~23.81%) R2R3-MYBs were not assigned to any subfamily/subgroup in this classification owing to the lack of different representative species and the limited gene number (generally only one per species), resulting in scattered studies with relatively small sample sizes. Second, we performed a systematic comparative analysis of 50 major eukaryotic lineages with a total of 1548 R2R3-MYBs [10]. As expected, we not only confirmed the 25 well-defined subfamilies of the R2R3-MYB gene family in A. thaliana [9] but also defined many new species- or lineage-specific subfamilies in major eukaryotic lineages that had previously been neglected [21]. Ultimately, we classified the gene family into as many as 73 subfamilies supported by highly conserved gene structures and motif compositions [10]. The validity of this classification was further supported by our recent study of Brassicaceae [18]. In addition, a more systematic classification with a high resolution based on 87 species and 4312 sequences has recently been released, covering the major lineages of Archaeplastida. In that study, ten clades were designated as land plant R2R3-MYB subfamilies by applying a progressive phylogenetic analysis strategy aimed at eliminating the imbalance in resolution and species sampling [6]. We speculated that there may be more subfamilies specific to individual species or lineages, as there are still many “orphan” genes in the phylogenetic analyses. Third, the genome sequences for Marchantia polymorpha [59] and M. viride [51] have enabled the development of a macroscopic classification regime, resulting in three clades of R2R3-MYBs that correspond to their biological functions. Each clade is correlated with different functions associated with essential developmental processes, from the basic life cycle to specialized metabolism and organismal complexity, corresponding to a very recent classification in 2020 [60]. In that study, genes encoding R2R3-MYBs from Rhodophytes, Glaucophytes, Chromista, Chlorophytes, Charophytes, and Embryophytes were divided into three major clades (I, II, and III) (named subgroups in the original paper). Subgroup I is the most ancient group and includes members from all plant lineages, whereas subgroup III has become predominant in land plants [60]. The former two classification schemes share overlaps and are based on domain conservation, whereas this final classification system provides a macroscopic perspective based on broad evolutionary patterns. Notably, a uniform nomenclature within the classification of this gene family in plants is still lacking. Currently, there is almost no dispute at the family (R2R3-MYB) and superfamily (MYB) levels, except for a few studies that refer to R2R3-MYBs as a subfamily or subgroup [50, 61]. However, the nomenclature after R2R3-MYB is inconsistent, e.g. ~7% of studies refer to groups [62, 63], ~31% refer to subfamilies [10, 64], and ~54% refer to subgroups [23, 65], with clade or cluster used occasionally [66, 67]. Subfamilies and subgroups are the most common subsequent appellations after family. Given that subfamily is used after the family level in many other TF gene families, e.g. basic helix–loop–helix proteins (bHLHs) [68], MADS-Box [69], and GRAS [70], we propose an improved nomenclature system from superfamily (MYB) and family (R2R3-MYB) to subfamily. “Subgroup” used by Chang and colleagues should be revised to evolutionary clade/group [60]. To avoid confusion, in the following sections, we uniformly refer to the lower classifications within the gene family as subfamilies (abbreviated as S1, S2, etc.).

Structural conservation of R2R3-MYBs

The R2R3-MYB protein consists of two major functional parts: a DNA-binding domain (MYB domain) located at the N terminus and a regulatory region (non-MYB region) located at the C terminus. The MYB domain is a signature, highly conserved feature in the gene family, whereas the non-MYB regions have diverged among plant species (Fig. 2a). In addition, the intron patterns in the MYB domains of the R2R3-MYBs are highly conserved within each subfamily across land plants [6, 9, 10, 22, 49]. In most cases, the MYB domain has one or two conserved intron insertion sites, with rare intron-less and multi-intron genes [6, 10]. Although the number of intron patterns differs slightly among taxonomic groups, there is clearly high structural conservation of MYB domains across land plants. In our previous study, we identified 12 highly conserved intron patterns across the plant kingdom [10] that may have arisen early in the transition from aquatic to terrestrial plants based on conserved intron numbers, insertion positions, and intron phases (Supplemental Fig. 1). The majority (~70%) of the tested R2R3-MYBs share three types (patterns a, b, and c) of the 12 intron patterns (patterns a to l) [10], suggesting biased expansion during evolution. A recent study has shown that some charophyte R2R3-MYBs exhibit intron–exon structures identical to those of their corresponding land homologs in the same subfamily, whereas others show clear differences, indicating a difference among ancestral genes [6].
Figure 2

Domain structure and functional characterization status of R2R3-MYB transcription factors. a. The R2R3-MYB transcription factor is composed of a MYB region and a non-MYB region. The DNA-binding domain, also called the MYB region, contains conserved R2 and R3 repeats. In most activators and some repressors [43], there is a conserved bHLH-interacting motif ([D/E] Lx2[R/K]x3Lx6Lx3R) within the first two helixes of the R3 domain that enables interactions with bHLH proteins [48] to form the MYB-bHLH-WDR transcriptional complex. The protein sequences in the C-terminal region often show divergence, with one or a few typical repressor motif(s) such as the EAR motif (ERF-associated amphiphilic repression), SID motif (Sensitive to ABA and Drought 2 protein interact motif), and TLLLFR. H, Helix; T, turn; W, tryptophan; X, amino acid. * indicates that the motif is not included in all R2R3-MYBs, but only in some. b. Number of species for which the functions of one or more R2R3-MYB genes were identified as of 2020. In the last decade (2011–2020), there was explosive growth in the breadth of taxa for which R2R3-MYB data were available. It is noteworthy that most of the increase in species number is ascribed to the vast number of orthologous MYB genes characterized in horticultural plants that have similar functions, especially in phenylpropanoid biosynthesis. c. Number of Arabidopsis R2R3-MYBs for which biological role(s) have been identified. In Arabidopsis, all the increase is due to paralogs with new functions. When a gene had different functions published in different years, we sorted it into the year of the first publication. The dark grey bar represents the number of R2R3-MYBs with unknown (i.e. not experimentally verified) functions.

Domain structure and functional characterization status of R2R3-MYB transcription factors. a. The R2R3-MYB transcription factor is composed of a MYB region and a non-MYB region. The DNA-binding domain, also called the MYB region, contains conserved R2 and R3 repeats. In most activators and some repressors [43], there is a conserved bHLH-interacting motif ([D/E] Lx2[R/K]x3Lx6Lx3R) within the first two helixes of the R3 domain that enables interactions with bHLH proteins [48] to form the MYB-bHLH-WDR transcriptional complex. The protein sequences in the C-terminal region often show divergence, with one or a few typical repressor motif(s) such as the EAR motif (ERF-associated amphiphilic repression), SID motif (Sensitive to ABA and Drought 2 protein interact motif), and TLLLFR. H, Helix; T, turn; W, tryptophan; X, amino acid. * indicates that the motif is not included in all R2R3-MYBs, but only in some. b. Number of species for which the functions of one or more R2R3-MYB genes were identified as of 2020. In the last decade (2011–2020), there was explosive growth in the breadth of taxa for which R2R3-MYB data were available. It is noteworthy that most of the increase in species number is ascribed to the vast number of orthologous MYB genes characterized in horticultural plants that have similar functions, especially in phenylpropanoid biosynthesis. c. Number of Arabidopsis R2R3-MYBs for which biological role(s) have been identified. In Arabidopsis, all the increase is due to paralogs with new functions. When a gene had different functions published in different years, we sorted it into the year of the first publication. The dark grey bar represents the number of R2R3-MYBs with unknown (i.e. not experimentally verified) functions. R2R3-MYB proteins can act as transcriptional activators as well as repressors [42]. Most are activators that bind to cis-elements in gene promoters to recruit transcriptional machinery and stimulate gene expression [43]. For instance, many R2R3-MYBs have a bHLH-interacting motif “(D/E)Lx2(R/K)x3Lx6Lx3R” in the R3 MYB domain (Fig. 2a), enabling physical interactions with the bHLH protein to form the MYB-bHLH-WDR (MBW) transcriptional complex [48]. Notably, although the conservation of the non-MYB region is much lower than that of the MYB domain, the C-terminal regions of closely related members within each subfamily or close subfamilies in an evolutionary clade generally share short conserved protein motifs (some called short linear motifs, SLiMs), such as EAR (ethylene-responsive element binding, “pdLNL(D/E)L” or “DLNxxP” in S4) [71], “(R/K)PRPRx(F/L)” in S6, SID (sensitive to ABA and Drought 2 protein interact motif), and TLLLFR [72] (Fig. 2a), indicating their common origin, functional similarities, and structural conservation [9]. Recently, Millard et al. [54] reported that the interactions between MYB TFs from S12 and bHLH were mediated by one of the conserved motifs ([L/F]LN[K/R]VA) located in the center of the non-MYB region. Two novel SLiMs with roles in protein–protein interaction were uncovered by Rodrigues et al. [73] in the disordered C-terminal region. For more detailed information, see the recent reviews by LaFountain and Yuan [47] and Chen et al. [35]. MYB activators and repressors often operate in a hierarchy [74]. Furthermore, many subfamily- and/or branch-specific motifs have been detected within R2R3-MYBs. For instance, based on a large number of species, up to 102 highly conserved motifs have been identified in non-MYB regions [10]. Interestingly, non-MYB regions across the entire A. thaliana R2R3-MYB gene family contain extensive intrinsically disordered regions (IDRs) with special post-translational modification sites, binding sites for physical interactions, and activation domains. These features may explain the substantial functional diversity and the structure–function relationships in plant R2R3-MYBs [54]. However, owing to the high divergence in non-MYB regions, a systematic analysis of motif(s) and/or IDRs covering broad and representative plant lineages is still lacking. Moreover, further experimental evidence is needed, including for IDRs.

Functional conservation and diversification of R2R3-MYBs in land plants

The biological functions of R2R3-MYBs have been verified in an increasing number of non-model crops with high economic value, especially in the last decade. More than 500 R2R3-MYBs have been functionally characterized in around 130 plant species (Fig. 2b and Supplemental Table 2), and their roles in many biological processes have been reported [5, 27–29, 32–34, 38, 41, 75–77]. Notably, research progress related to R2R3-MYB gene function is based mainly on studies of the model plant A. thaliana; approximately 80% of its 126 R2R3-MYBs have been experimentally characterized (Fig. 2c). Overall, the functions of R2R3-MYBs can be classified into three major processes: development and cell differentiation, specialized metabolism (especially the phenylpropanoid biosynthesis pathway), and stress responses (biotic and abiotic stresses).

R2R3-MYBs are required for developmental and cell differentiation processes

Plant R2R3-MYBs are involved in the regulation of many important aspects of plant growth and development (Supplemental Table 2), e.g. trichome initiation and branching [78, 79], cuticle development [80], and regulation of different organs, including the flower, seed [81, 82], shoot [41, 83, 84], and root [85, 86]. In general, genes from the S9 (e.g. AtMYB16), S15 (e.g. AtMYB66), S18 (e.g. AtMYB33), S25 (e.g. AtMYB64), and S27 (e.g. AtMYB91) subfamilies [9] are exclusively associated with the above-mentioned biological functions (Supplemental Table 2). For instance, genes in S9 are involved in cuticle formation and trichome branching in A. thaliana (AtMYB16 and AtMYB106) [80] and M. polymorpha (MpSBG9) [87], conical epidermal cell outgrowth in Antirrhinum majus (AmMYBML1) [88], photosynthesis and growth in Betula platyphylla (BpMYB106) [89], and fiber development in cotton (GhMML3) [90]. The genes in S18 mainly regulate the development of flower organs such as stamens (AtMYB33, AtMYB65, and BcMF28) [91, 92] and anthers (OsGAMYB) [93], as well as pollen tube–synergid interactions (AtMYB97, AtMYB101, and AtMYB120) [94]. Interestingly, the above processes are usually mediated by hormones and/or hormone signaling, e.g. JA and GA. R2R3-MYBs (e.g. members of S25) are also involved in plant reproductive growth [95] (Supplemental Table 2). More recently, MpFGMYB, a member of S25, was identified as a critical factor in sexual dimorphism determination in M. polymorpha [96], and similar results regarding spore and sexual organ development were obtained in Selaginella moellendorffii (SmGAMB) and Physcomitrella patens (PpGAMYB1 and PpGAMYB2) [97]. Members of S27 contribute to the regulation of vegetative organ formation, including leaf patterning and development (NbPHAN and PHANTASTICA) [98, 99], stomatal cell differentiation and formation [16], and axillary meristem formation (ZmRS2 and SlPHAN) [100-102] (Supplemental Table 2). S14 also functions in meristem initiation as well as root formation (CRY1) [103, 104] (Supplemental Table 2). A good example in this subfamily is the MpGCAM1 gene, which precisely controls gemma cup formation in liverworts [105]. Another example of genes that regulate cell growth during gametophytic development are MpPp1 and MpPp2 in P. patens [106]. Furthermore, some subfamilies have species-specific functions. For instance, AtMYB21, AtMYB24, and AtMYB57 in S19 have key roles in stamen development [107, 108]. However, their homologs in almost all other species studied, including Freesia hybrids [109, 110], Gentian tutea [111], Ficus carica [112], Malus domestica [113], and Petunia hybrida [114], regulate specialized metabolism (Supplemental Table 2). Further research should focus on S16 [115], S24 [116], S28 [117], S31 [118], S33 [119], S36 [120], and S37 [121, 122], for which experimental data are rather limited (Supplemental Table 2). In summary, R2R3-MYBs are a class of important TFs necessary for the basic life cycle of plants. Therefore, there is a need for more extensive field studies of R2R3-MYBs in the future to determine their functions in basic developmental processes.

R2R3-MYBs respond to various environmental stresses

Over the past few decades, R2R3-MYBs have emerged as key regulators of responses to diverse abiotic stresses, such as drought, temperature, salinity, and phosphate starvation [28, 123–127] (Supplemental Table 2); see previous reviews [75, 128, 129]. Representative subfamilies are S1, S2, S11, S17, S20, S22, and S38. In A. thaliana, members of S1 (AtMYB30, AtMYB60, and AtMYB96) are involved in drought, heat, salt, and excess-light stresses [126, 130–137]; members of S2 (AtMYB13, AtMYB14, and AtMYB15) act as positive regulators of drought, wounding, and freezing tolerance [138-141]; AtMYB41 and AtMYB74 in S11 are involved in osmotic and/or salt stress [142, 143]; members of S20 (AtMYB2, AtMYB62, and AtMYB108) are responsive to a wide range of environmental stresses [144-146]; similar roles have been observed for S22 members (AtMYB44, AtMYB73, and AtMYB77) [26, 147, 148]. AtMYB48 in S38 is involved in potassium stress possibly via alternative splicing, which modulates DNA-binding motifs [149]. Some genes from S3 [150], S4 [125], S8 [151], and S23 [152] also show stress-induced functionality in A. thaliana. In addition to the R2R3-MYBs in A. thaliana, around 70 genes identified in 26 species have also been found to participate in distinct abiotic stresses (Supplemental Table 2). As in A. thaliana, such proteins are abundant in S2 [153-155], S11 [156-158], S20 [159-161], and S22 [162-165]. However, distinct differences have been noted. Some groups have been confirmed only in A. thaliana (e.g. AtMYB1 from S23a and AtMYB72 from S3) [150, 152] or only in non-model species, such as S44, S4, S17, S28, S43, S7, S39, and S79 [123, 127, 166–178]. Taken together, these results suggest that R2R3-MYBs control responses to multiple abiotic stresses and act as major regulators. However, more research is needed to improve our understanding of the functional divergence and conservation of this gene family in plants. This understanding could provide a theoretical basis for the development of strategies to resolve the adverse effects of abiotic stresses, which are exacerbated by climate change. Many R2R3-MYBs contribute to the biotic stress response, and at least 13 R2R3-MYBs have been reported to date (Supplemental Table 2). Interestingly, eight of these genes belong to S2. OsMYB30 directly upregulates OsPAL6 and OsPAL8 to enhance resistance to the brown planthopper in rice [29]. The overexpression of CmMYB15 in chrysanthemum confers resistance to aphids by regulating lignin biosynthesis [179]. GmMYB29A2 regulates soybean resistance to Phytophthora sojae [180]. SlHM1 and SlMYB52 alter trichome density to confer spider mite tolerance in tomato via auxin signaling [118]. VdMYB1 in Vitis davidii [181] and TaRIM1 in wheat regulate pathogen defense [182]. Other examples include AtMYB30 (S1) [183], AtMYB96 (S1) [184], AtMYB102 (S11) [185], and AtMYB72 (S3) [186, 187] in A. thaliana and OsJAmyb (S39) in Oryza sativa [188]. In summary, R2R3-MYBs could improve plant responses to multiple abiotic and biotic stresses in agriculture and could be targets of advanced technologies, such as CRISPR technology, to improve crop resistance to environmental stimuli in order to maintain the food supply.

Key roles of R2R3-MYBs in metabolite biosynthesis

The principal roles of plant R2R3-MYBs (~70%) reported to date are in the regulation of plant specialized metabolism, including the benzenoid, phenylpropanoid, terpenoid, and glucosinolate (GSL) pathways (Supplemental Table 2). Nearly half of the functionally characterized A. thaliana R2R3-MYBs are related to core and specialized metabolism reactions (Fig. 3 and Supplemental Table 2) [5, 9]. Notably, the majority of these R2R3-MYBs are involved as activators or repressors in the transcriptional regulation of the phenylpropanoid biosynthesis pathway, which gives rise to a class of phenylpropanoid-derived compounds such as flavonoids (proanthocyanidins, anthocyanins, flavones, flavonols, isoflavonoids, and phlobaphenes), lignins, and other general phenylpropanoids (Fig. 3a) [43, 77, 189].
Figure 3

Major metabolite biosynthetic pathways regulated by R2R3-MYBs and simplified schemes are shown. a. The most intensively investigated regulatory metabolites are phenylpropanoid biosynthesis-derived compounds, including flavonoids and lignins. Typical groups of flavonoids are shown in the yellow box, several representative groups for specific branches of flavonoid biosynthesis are shown in the deep yellow box, and typical groups for lignins are shown in the light blue box. More information regarding the phenylpropanoid pathway has been reviewed recently [190]. b. R2R3-MYB regulation of glucosinolate biosynthesis in S12 has primarily been studied in Brassicaceae. The synthesis pathway involves three phases, with two types of starting amino acids, that are regulated by different R2R3-MYBs. A fine review with more details has recently been published by Mitreiter and Gigolashvili [191]. c. The best-studied terpenoids regulated by R2R3-MYBs are floral volatiles.

Major metabolite biosynthetic pathways regulated by R2R3-MYBs and simplified schemes are shown. a. The most intensively investigated regulatory metabolites are phenylpropanoid biosynthesis-derived compounds, including flavonoids and lignins. Typical groups of flavonoids are shown in the yellow box, several representative groups for specific branches of flavonoid biosynthesis are shown in the deep yellow box, and typical groups for lignins are shown in the light blue box. More information regarding the phenylpropanoid pathway has been reviewed recently [190]. b. R2R3-MYB regulation of glucosinolate biosynthesis in S12 has primarily been studied in Brassicaceae. The synthesis pathway involves three phases, with two types of starting amino acids, that are regulated by different R2R3-MYBs. A fine review with more details has recently been published by Mitreiter and Gigolashvili [191]. c. The best-studied terpenoids regulated by R2R3-MYBs are floral volatiles. Flavonoids are a vast group of plant specialized metabolites with a common diphenylpropane (C6-C3-C6) backbone [45] that have important functions as pigments and/or light protectants. Typical subfamilies in the R2R3-MYB gene family that are involved in the regulation of flavonoid biosynthesis are S6, S5, S4, S7, S44, and S79 (Supplemental Table 2). In addition to their roles in A. thaliana, there is substantial emerging evidence for the roles of R2R3-MYBs in the transcriptional regulation of flavonoid metabolism in various plants, including important food crops (41), horticultural crops (fruits, 77; ornamentals, 61; vegetables, 37), economically valuable woody crops (27), and edible and medicinal plants (30). With respect to edible crops, including fruits and vegetables, research has mainly focused on substances such as anthocyanins and flavonols that determine their quality and nutritional properties. However, flower colors are key traits for flower crops [42]. Potato StMYB44 has recently been identified as a repressor of anthocyanin biosynthesis in tuber flesh under high temperatures [192], and whole-genome resequencing-based QTL-seq indicated that AhTc1 controls the purple testa color in peanut [193]. In addition, the overexpression of MdMYB24L resulted in higher anthocyanin contents in transgenic ‘Orin’ apple calli [113]. Similar studies have examined R2R3-MYBs in other fruit trees, including PaMYB10 in apricot [194], AcMYB123 in kiwifruit [32], LcMYB5 in litchi [195], and FcMYB123 in fig [112]. In M. polymorpha, MpMYB14 and MpMYB02 regulate anthocyanin accumulation [196]. Moreover, different types of flavonoid regulation are broadly observed. For instance, regulatory roles have been reported for PhMYB15 and PpMYBF1 in flavonol biosynthesis in peach fruit [197], FhMYB21L2 in flavonol biosynthesis in Freesia [109], CsPH4 in proanthocyanidin biosynthesis in citrus [198], SlMYB72 in carotenoid biosynthesis in tomato [199], CaMYB108 in capsaicinoid biosynthesis in pepper [200], AgMYB1 in apigenin biosynthesis in celery [137], HaMYB111 in floral ultraviolet patterning in sunflower [201], and SmMYB2 in phenolic acid biosynthesis in Salvia miltiorrhiza [202]. Apart from model flowers such as Petunia and Antirrhinum, a number of flowering bulbs have also been studied recently, including Freesia [203, 204], Cattleya [205], and Narcissus [206]. Lignin is another major end-product of the phenylpropanoid pathway and a key component of secondary cell walls (SCWs) in wood; it is polymerized from phenylpropanoid-derived monolignols [207]. The role of R2R3-MYBs in the lignin biosynthesis pathway is a major focus of research. As wood is widely used for pulp, papermaking, and biofuels and wood quality is largely determined by lignin synthesis, most studies of the regulatory effects of R2R3-MYBs on lignin synthesis are based on woody plants. For example, the overexpression of PtoMYB055 upregulates the expression of lignin biosynthetic genes in transgenic poplar, resulting in an increase in the thickness of the SCW [208]. By contrast, PtMYB189 acts as a repressor to regulate SCW biosynthesis in poplar [209]. Likewise, the positive or negative regulation of the SCW by R2R3-MYBs has also been reported in Z. mays [210], G. hirsutum [211], O. sativa [212], Pyrus × bretschneideri [213], and Eriobotrya japonica [214]; however, additional studies are still needed. In addition to phenylpropanoid metabolism, which has been investigated extensively in plants, other representative specialized metabolic compounds have been studied, such as those produced by the GSL and terpenoid pathways (Fig. 3b–c). GSLs have well-established anticarcinogenic and antioxidative effects in humans [215]. Interestingly, consistent with the distribution of GSLs, members of S12 are widely distributed in species from the family Brassicaceae and are predominantly related to GSL biosynthesis in this family (Supplemental Table 2). It has been reported that indolic GSL synthesis is regulated mainly by AtMYB34, AtMYB51, and AtMYB122, whereas aliphatic GSL synthesis is regulated by AtMYB28, AtMYB29, and AtMYB76 [191]. For example, silencing BjMYB28 homologs reduces the seed GSL content in Brassica juncea [216], BoMYB29 increases the methylsulphinyl GSL content in Brassica oleracea var. acephala [217], and BnMYB28.1 was identified by a QTL analysis using near-isogenic lines [218]. These studies provide good examples of the adaptive evolution of the R2R3-MYB gene family in land plants, especially in angiosperm diversification. Some plants can produce and emit fragrant molecules such as benzenoids and terpenoids, especially in flowers, and these processes are strictly regulated at the transcriptional level by R2R3-MYBs. For instance, two R2R3-MYBs, FhMYB21L1 and FhMYB21L2, are expressed synchronously with FhTPS1 and can activate its expression to affect monoterpene synthase synthesis when overexpressed [110]. In addition, BpMYB21 and BpMYB61 in B. platyphylla are involved in triterpenoid synthesis [219]. R2R3-MYBs regulate terpenoid compounds in several medicinal plants, including S. miltiorrhiza [220, 221], Panax ginseng [222], and Mentha spicata [223]. The research summarized above clearly demonstrates that plant R2R3-MYBs, which are related to plant specialized metabolism, are mainly involved in the phenylpropanoid biosynthesis pathway, and each subfamily member has key roles in distinct processes, such as proanthocyanidin or lignin biosynthesis. Accordingly, this gene family has probably contributed to functional evolution in land plants.

The origin and expansion of R2R3-MYBs in land plants are associated with functionality

Gene duplication is a prominent event in plant genome evolution and contributes to the establishment of multi-gene families. Genes can be duplicated by various mechanisms, such as whole genome duplication (WGD), chromosomal segmental duplication, tandem duplication, and retrotransposition. Based on systematic analyses of many land plants, we previously found that WGD/polyploidy events and small-scale duplications (e.g. segmental duplications) accounted for the large expansion of the R2R3-MYB gene family [18, 22, 209]. Similar results have been obtained in studies of broad plant lineages [21, 56, 215, 224]. It is well known that WGD events are a driving force in angiosperm diversification. The tremendous expansion of R2R3-MYBs in land plants is consistent with WGD in angiosperms, indicating an important role for the expansion of R2R3-MYBs in the increased complexity of angiosperms. Small-scale duplications (e.g. tandem/segmental duplications) have also contributed to the rapid expansion and large size of this gene family in land plants, as well as the evolution of novel gene functions [10, 18, 21, 56, 224, 225]. R2R3-MYBs underwent a rapid expansion over the course of plant evolution (i.e. from algae to land plants), resulting in a gradual increase in number together with an increase in organismal complexity. Accordingly, the number of genes in this family in land plants is large, showing a huge expansion after the divergence of angiosperms from other vascular plants [10, 60]. For example, only eight R2R3-MYBs were detected in the single-celled chlorophyte Chlamydomonas reinhardtii, whereas up to 429 R2R3-MYBs were observed in the allotetraploid Brassica napus [18] (Fig. 1). Based on an analyses of 50 eukaryotes [10], the R2R3-MYB gene family was classified into 73 subfamilies, most of which were newly defined (S26–S73), with the exception of the first 25 subfamilies from Arabidopsis [9]. Based on the distribution of subfamily members across the 27 plant genomes investigated in the study [10], it was speculated that a few subfamilies (S18, S21, S22, S26, and S27) were established soon after the origin of land plants and may be the ancestors of land plant R2R3-MYBs, whereas the major subfamilies formed a monophyletic clade in land plants [10]. In summary, plant R2R3-MYBs experienced three major expansion events: one early in the origin of land plants from Chlorophyta, one after the divergence of spermatophytes from vascular plants, and one in the common ancestor of angiosperms before the divergence of monocots and eudicots, forming the majority of R2R3-MYB subfamilies [10, 60]. Notably, members of this gene family exhibit an uneven phylogenetic distribution; the expansion was significantly biased toward subfamilies with three highly homologous intron patterns (a, b, and c) during evolution, resulting in an enormous expansion in the number of R2R3-MYBs in the plant lineage [10, 18] (Fig. 4a and Supplemental Fig. 1). For instance, in the 12 land plants investigated in our previous study, 2R-MYBs with patterns a–c generally accounted for ~66–81% of this gene family in 10 angiosperms, including Z. mays (66%), Arabidopsis (73%), and Vitis vinifera (81%). A similar situation (73%) was found in B. napus [18]. By contrast, the corresponding percentages were 59% in P. patens and 20% in S. moellendorffii [10] (Fig. 4a). Accordingly, up to 43 subfamilies have intron pattern a, and 12 and 19 subfamilies have intron patterns a and b, respectively, and have generally been integrated into the subfamilies with pattern a (Fig. 4b and Supplemental Fig. 2). Novel subfamilies have commonly been derived from these three categories [10, 18]. By contrast, the numbers of genes with the last nine intron patterns (d–l) are relatively conserved in land plants [10, 18] (Fig. 4a). Relatively few genes have the last nine intron patterns (d–l), which generally account for less than 30% of this gene family in most angiosperms [10]. With the exception of patterns d and j, which are shared by 7 and 2 subfamilies, respectively, each of the last 7 patterns are shared by only one subfamily (e.g. pattern e in S21 and pattern f in S19) (Fig. 4b and Supplemental Fig. 2). Thus, the number of genes with patterns d–l is relatively conserved in land plants [10, 18].
Figure 4

Phylogenetic relationships among functionally characterized plant R2R3-MYBs. a. The number and percentage of R2R3-MYBs displaying each intron pattern as shown in Supplemental Fig. 1 in Physcomitrella patens (Pp), Selaginella moellendorfii (Sm), Zea mays (Zm), Oryza sativa (Os), Aquilegia coerulea (Ac), Solanum lycopersicum (Sl), Solanum tuberosum (St), Vitis vinifera (Vv), Arabidopsis thaliana (At), Citrus sinensis (Csi), Medicago truncatula (Mt), Populus trichocarpa (Pt), and Brassica napus (Bna). b. The maximum likelihood (ML) tree was constructed with 100 replications using the JJT + G model. It contains 598 nonredundant R2R3-MYBs, including 126 A. thaliana R2R3-MYBs [9] and 435 functionally characterized R2R3-MYBs from other plant taxa (Supplemental Table 2), as well as representatives of 73 subfamilies from our previous results [10]. The tree was rooted using S29 (the CDC5-like protein) as the outgroup. The scale bar represents 2 substitutions per site. Detailed information on the ML tree is provided in Supplemental Fig. 2. Detailed information on the sequences used for phylogenetic analyses is provided in Supplemental Table 3. The subfamily classification and nomenclature of the 598 proteins were based on those of Arabidopsis [9] and our previous study [10]. The new subfamilies, with the exception of the 25 subfamilies from Arabidopsis [9], are underlined, and their detailed functions and classification information are provided in Supplemental Table 2 and Fig. 2. The colored squares indicate the intron pattern to which each subfamily member belongs (Supplemental Fig. 1). Nodes with bootstrap values ≥70% and ≥50% are shown as black and gray dots, respectively, in the phylogenetic tree.

Phylogenetic relationships among functionally characterized plant R2R3-MYBs. a. The number and percentage of R2R3-MYBs displaying each intron pattern as shown in Supplemental Fig. 1 in Physcomitrella patens (Pp), Selaginella moellendorfii (Sm), Zea mays (Zm), Oryza sativa (Os), Aquilegia coerulea (Ac), Solanum lycopersicum (Sl), Solanum tuberosum (St), Vitis vinifera (Vv), Arabidopsis thaliana (At), Citrus sinensis (Csi), Medicago truncatula (Mt), Populus trichocarpa (Pt), and Brassica napus (Bna). b. The maximum likelihood (ML) tree was constructed with 100 replications using the JJT + G model. It contains 598 nonredundant R2R3-MYBs, including 126 A. thaliana R2R3-MYBs [9] and 435 functionally characterized R2R3-MYBs from other plant taxa (Supplemental Table 2), as well as representatives of 73 subfamilies from our previous results [10]. The tree was rooted using S29 (the CDC5-like protein) as the outgroup. The scale bar represents 2 substitutions per site. Detailed information on the ML tree is provided in Supplemental Fig. 2. Detailed information on the sequences used for phylogenetic analyses is provided in Supplemental Table 3. The subfamily classification and nomenclature of the 598 proteins were based on those of Arabidopsis [9] and our previous study [10]. The new subfamilies, with the exception of the 25 subfamilies from Arabidopsis [9], are underlined, and their detailed functions and classification information are provided in Supplemental Table 2 and Fig. 2. The colored squares indicate the intron pattern to which each subfamily member belongs (Supplemental Fig. 1). Nodes with bootstrap values ≥70% and ≥50% are shown as black and gray dots, respectively, in the phylogenetic tree. As discussed above, plant R2R3-MYBs are crucial for the regulation of many plant-specific processes related to metabolism (22 subfamilies), development (16 subfamilies), and biotic and abiotic stress processes (14 subfamilies) (Fig. 4b and Supplemental Fig. 2). Notably, the expansion of this gene family is closely accompanied by increases in the functional diversity of R2R3-MYBs in the plant kingdom. Interestingly, older subfamilies are generally involved in development and/or stress-related processes, whereas the majority of new/derived subfamilies in angiosperms are related to metabolism (Fig. 4b and Supplemental Fig. 2), suggesting a bias in functional diversification toward specialized metabolism. In particular, the evolution of R2R3-MYBs in plants is related to specific expansions giving rise to species- or lineage-specific subfamilies [224]. For example, 5 of 43 B. napus subfamilies are Brassicaceae specific [18]. Several of the 43 pineapple subfamilies, such as C2, C8, and C22, are likely to be species or lineage specific [64]. Similarly, three dicot-specific and six grass-specific subfamilies have been observed based on a comparative genomic analysis of R2R3-MYBs in Arabidopsis, poplar, rice, maize, and switchgrass [226]. Although the functions of most species- or lineage-specific subfamilies are unknown, the distributions suggest that they have lineage-specific functions. For instance, members of the Brassicaceae-specific S12 subfamily (e.g. AtMYB28/HAG1, AtMYB29/HAG3, AtMYB34, AtMYB51/HIG1, AtMYB76/HAG2, and AtMYB122 in A. thaliana) are regulators of Brassicaceae-specific GSL biosynthesis (Supplemental Table 2 and Supplemental Fig. 2). Three expanded subfamilies in Eucalyptus grandis, including a significantly higher number in woody perennial species (E. grandis, V. vinifera, and P. trichocarpa), and five subfamilies preferentially found in woody plants are potentially involved in cambium-derived woody growth [224]. Several sugar beet MYBs with an atypical amino acid composition in the R3 domain are species-specific regulators of the betalain red pigment pathway [227, 228]. Together, these results indicate an obvious expansion and functional trend toward specialized metabolism in this gene family during angiosperm evolution.

Concluding remarks and prospects

R2R3-MYBs are ubiquitous in eukaryotes and are not plant specific, but the numbers of this type of MYB gene are generally higher in plants (especially in angiosperms) than in other eukaryotes [10], and they constitute one of the largest TF gene families in plant genomes. As the number of sequenced plant genomes has increased, the plant R2R3-MYB gene family has been systematically identified and analyzed in about 74 species at a genome-wide level (Fig. 1). These data have substantially improved our understanding of R2R3-MYB distribution, origin, classification, expansion, and evolutionary mechanisms in plants. As sessile organisms, plants have had to adapt to constantly changing environments during evolution. Through time, plants have differentiated into highly complex organisms and have developed a large number of specialized metabolites with beneficial functions for survival and adaptation. R2R3-MYBs have undergone rapid expansion during plant evolution via WGD and small-scale duplications. These drastic expansion/duplication events have played a key role in generating diversity, producing many conserved subfamilies and also lineage-specific subfamilies with specific biological functions (e.g. S12 in GSL biosynthesis), especially those related to specialized metabolism. About 70% of functionally characterized R2R3-MYBs and most subfamilies are related to specialized metabolic processes focused on phenylpropanoid-derived secondary metabolites (Fig. 4b and Supplemental Table 2). The rapid expansion and diversification of R2R3-MYBs have been ongoing processes throughout the evolution of angiosperms. It is not surprising that this large gene family has contributed to the evolution of physiological or developmental processes that are specific to plants. The expansion of R2R3-MYBs in land plants may underpin the emergence of tremendous plant diversity, providing new functional characteristics related to protection against stress and changing environmental conditions. Thus, uncovering the genomic and molecular basis of the origin and evolution of such functional characteristics is a major research goal. Owing to the availability of resources for gene functional analyses (e.g. genomic data, genetic populations, and experimental tools), a large proportion of the Arabidopsis R2R3-MYB gene family has been functionally characterized (about 100 of 126 genes). However, our current knowledge of the functions of plant R2R3-MYBs is based mainly on the model plant Arabidopsis, whereas studies of non-traditional models (such as crops) are in an early stage. More information about the functions of R2R3-MYBs in a broader array of plant taxa will undoubtedly improve our understanding of the mechanisms underlying their evolution and functional diversification. Newly developed genomic technologies (e.g. third-generation long-read sequencing technology) combined with traditional and novel molecular technologies (e.g. genome editing tools) offer excellent opportunities for further improving our understanding of the functions and functional diversification of R2R3-MYBs. Click here for additional data file.
  222 in total

Review 1.  The R2R3-MYB gene family in Arabidopsis thaliana.

Authors:  R Stracke; M Werber; B Weisshaar
Journal:  Curr Opin Plant Biol       Date:  2001-10       Impact factor: 7.834

Review 2.  R2R3 MYB Transcription Factors - Functions outside the DNA-Binding Domain.

Authors:  Peter S Millard; Birthe B Kragelund; Meike Burow
Journal:  Trends Plant Sci       Date:  2019-07-26       Impact factor: 18.313

3.  Identification, isolation and expression analysis of eight stress-related R2R3-MYB genes in tartary buckwheat (Fagopyrum tataricum).

Authors:  Fei Gao; Hai-Xia Zhao; Hui-Peng Yao; Cheng-Lei Li; Hui Chen; An-Hu Wang; Sang-Un Park; Qi Wu
Journal:  Plant Cell Rep       Date:  2016-03-28       Impact factor: 4.570

4.  NbPHAN, a MYB transcriptional factor, regulates leaf development and affects drought tolerance in Nicotiana benthamiana.

Authors:  Changjun Huang; Gaojie Hu; Fangfang Li; Yunqin Li; Jianxiang Wu; Xueping Zhou
Journal:  Physiol Plant       Date:  2013-03-15       Impact factor: 4.500

5.  The MYB96 transcription factor mediates abscisic acid signaling during drought stress response in Arabidopsis.

Authors:  Pil Joon Seo; Fengning Xiang; Meng Qiao; Ju-Young Park; Young Na Lee; Sang-Gyu Kim; Yong-Hwan Lee; Woong June Park; Chung-Mo Park
Journal:  Plant Physiol       Date:  2009-07-22       Impact factor: 8.340

6.  The land plant-specific MIXTA-MYB lineage is implicated in the early evolution of the plant cuticle and the colonization of land.

Authors:  Bo Xu; Lin Taylor; Boas Pucker; Tao Feng; Beverley J Glover; Samuel F Brockington
Journal:  New Phytol       Date:  2020-11-08       Impact factor: 10.151

7.  The Wheat MYB Transcription Factor TaMYB31 Is Involved in Drought Stress Responses in Arabidopsis.

Authors:  Yue Zhao; Xiyong Cheng; Xiaodan Liu; Huifang Wu; Huihui Bi; Haixia Xu
Journal:  Front Plant Sci       Date:  2018-09-28       Impact factor: 5.753

8.  Differential regulation of closely related R2R3-MYB transcription factors controls flavonol accumulation in different parts of the Arabidopsis thaliana seedling.

Authors:  Ralf Stracke; Hirofumi Ishihara; Gunnar Huep; Aiko Barsch; Frank Mehrtens; Karsten Niehaus; Bernd Weisshaar
Journal:  Plant J       Date:  2007-04-05       Impact factor: 6.417

Review 9.  Understanding of MYB Transcription Factors Involved in Glucosinolate Biosynthesis in Brassicaceae.

Authors:  Mi-Suk Seo; Jung Sun Kim
Journal:  Molecules       Date:  2017-09-14       Impact factor: 4.411

10.  R2R3 MYB-dependent auxin signalling regulates trichome formation, and increased trichome density confers spider mite tolerance on tomato.

Authors:  Yujin Yuan; Xin Xu; Yingqing Luo; Zehao Gong; Xiaowei Hu; Mengbo Wu; Yudong Liu; Fang Yan; Xiaolan Zhang; Wenfa Zhang; Yuwei Tang; Bihong Feng; Zhengguo Li; Cai-Zhong Jiang; Wei Deng
Journal:  Plant Biotechnol J       Date:  2020-07-26       Impact factor: 9.803

View more
  2 in total

1.  The Transcription Factor MYB37 Positively Regulates Photosynthetic Inhibition and Oxidative Damage in Arabidopsis Leaves Under Salt Stress.

Authors:  Yuanyuan Li; Bei Tian; Yue Wang; Jiechen Wang; Hongbo Zhang; Lu Wang; Guangyu Sun; Yongtao Yu; Huihui Zhang
Journal:  Front Plant Sci       Date:  2022-07-12       Impact factor: 6.627

Review 2.  MYB Transcription Factors Becoming Mainstream in Plant Roots.

Authors:  Zhuo Chen; Zexuan Wu; Wenyu Dong; Shiying Liu; Lulu Tian; Jiana Li; Hai Du
Journal:  Int J Mol Sci       Date:  2022-08-17       Impact factor: 6.208

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.