Literature DB >> 32655308

Phylogenomic Analysis of R2R3 MYB Transcription Factors in Sorghum and their Role in Conditioning Biofuel Syndrome.

Vinay Singh1, Neeraj Kumar1, Anuj K Dwivedi1, Rita Sharma1, Manoj K Sharma1.   

Abstract

BACKGROUND: Large scale cultivation of sorghum for food, feed, and biofuel requires concerted efforts for engineering multipurpose cultivars with optimised agronomic traits. Due to their vital role in regulating the biosynthesis of phenylpropanoid-derived compounds, biomass composition, biotic, and abiotic stress response, R2R3-MYB family transcription factors are ideal targets for improving environmental resilience and economic value of sorghum.
METHODS: We used diverse computational biology tools to survey the sorghum genome to identify R2R3-MYB transcription factors followed by their structural and phylogenomic analysis. We used in-house generated as well as publicly available high throughput expression data to analyse the R2R3 expression patterns in various sorghum tissue types.
RESULTS: We have identified a total of 134 R2R3-MYB genes from sorghum and developed a framework to predict gene functions. Collating information from the physical location, duplication, structural analysis, orthologous sequences, phylogeny, and expression patterns revealed the role of duplications in clade-wise expansion of the R2R3-MYB family as well as intra-clade functional diversification. Using publicly available and in-house generated RNA sequencing data, we provide MYB candidates for conditioning biofuel syndrome by engineering phenylpropanoid biosynthesis and sugar signalling pathways in sorghum.
CONCLUSION: The results presented here are pivotal to prioritize MYB genes for functional validation and optimize agronomic traits in sorghum.
© 2020 Bentham Science Publishers.

Entities:  

Keywords:  Biofuel; R2R3-MYB; phenylpropanoids; sorghum; stress; transcription factors

Year:  2020        PMID: 32655308      PMCID: PMC7324873          DOI: 10.2174/1389202921666200326152119

Source DB:  PubMed          Journal:  Curr Genomics        ISSN: 1389-2029            Impact factor:   2.236


INTRODUCTION

Sweet sorghum accumulates high levels of directly fermentable sugars within the culm and therefore, has immense potential as a biofuel crop [1]. The lignocellulosic biomass left after grain harvesting and sugar extraction, can also be used as feedstock for biofuels [1, 2]. Due to limited resource requirements and abiotic stress tolerance, sorghum is considered as a future-ready multipurpose crop. However, it has a very brief history of breeding. To adapt to changing climatic conditions and evolving market model, the development of regional ideotypes with enhanced grain yields, brix content, amenability to deconstruction and ability to thrive under adverse environmental conditions is required for large-scale deployment of sorghum as a biofuel crop [1]. Transcription factors (TF), due to their regulatory roles, have been widely used in the past to improve the agronomic performance of crop plants [3-5]. MYB proteins comprise of, one of the largest in number and functionally heterogeneous TF family, and is widely distributed in all the eukaryotes. Since the identification of the first MYB protein in maize, a large number of MYB proteins have been identified from diverse plant species (Fig. ). MYB TFs are characterized by the presence of 1-4 highly conserved DNA binding MYB domains, with each of them typically denoted by “R” [6]. Based on the number of conserved domains, MYB TFs have been classified into four groups namely, 1R, 2R, 3R and 4R (four R1/R2 like repeats) [7]. Each MYB repeat is ~52 amino acids long that includes three hydrophobic residues (usually tryptophan) placed at regular intervals. Each MYB domain comprises three alpha-helices, of which 2nd and 3rd helix form a helix-turn-helix (HTH) structure that interacts with the major groove of the DNA [8-10]. The highly conserved hydrophobic tryptophan residues play a significant role in sequence-specific interactions with DNA [11]. The evolutionary presence of MYB domain-containing proteins in lower organisms dates back to the time of divergence of plant and animal lineages [12-14]. However, lineage-specific expansion of MYB proteins resulted in diversity in MYB proteins in diverse species. 3R-MYB proteins are dominant MYBs in the animal kingdom, while, 2R-MYB proteins are more prevalent in plants. The rapid expansion of 2R-MYB proteins occurred during the evolution of land plants and this expansion in plants is consistent with the whole-genome duplication events in angiosperms. Based on the evolutionary analysis, MYB proteins have been divided into three groups namely, A-, B- and C-groups. Earliest angiosperms had all the three groups of MYB proteins while later during the evolution, lineage-specific duplication and domain loss resulted in variation in MYB family members in modern higher plants [15, 16]. Further, lineage-specific conservation of small regulatory motifs within the MYB domains and intron gain, supporting the intron late hypothesis, has been reported during MYB protein evolution in plants [12, 15, 17-19]. The 2R-MYB’s, also referred as R2R3-MYB TFs, are the highest in number in plants. Several of these proteins have been characterized through genetic approaches and shown to play role in diverse biological processes ranging from morphogenesis, meristem formation, secondary cell wall biosynthesis, hormonal signal transduction, light signalling, male fertility, flowering, seed development, disease resistance, herbivore resistance, abiotic stress tolerance, and secondary metabolite biosynthesis [7, 16, 20, 21]. Many of these traits, such as flowering time, saccharification efficiency, cell wall composition, abiotic stress tolerance and disease resistance (collectively referred to as biofuel syndrome), are directly associated with biofuel production making them ideal candidates for enhancing biofuel-related traits [21, 22]. In fact, the application of R2R3-MYB genes in improving sugar release from plant-based biomass has already been demonstrated in several plant species. For instance, overexpression of MYB31 and 42 in sugarcane improves glucose release with a concomitant decrease in acid-insoluble lignin [23]. Similarly, PtoMYB170 positively regulates lignin biosynthesis genes and lignin deposition in secondary walls of xylem cells during wood formation in poplar [24]. Whereas, ZmMYB31 downregulates several monolignol synthesis genes with its overexpression in transgenic plants resulting in significant reduction in lignin content [25]. Similarly, overexpression of PvMYB4 led to reduced lignin and increased saccharification in switchgrass [26]. Therefore, systematic identification and characterization of R2R3-MYB genes in sorghum is a key step towards optimising sweet sorghum for enhanced biofuel-related traits. Leveraging recent improvements in assembly and gene annotations of sorghum [27, 28], we performed genome-wide data mining to identify the complete repertoire of R2R3-MYB family genes in sorghum. Subsequently, we used inferences from localization, duplication, structure, phylogeny, orthology and gene expression to predict R2R3-MYB gene functions in sorghum. We identified a total of 134 R2R3-MYB genes in sorghum and classified them into 14 groups based on the phylogenetic analysis. Mapping of expression data along with information about orthologous gene functions from other plant species on phylogenetic tree facilitated gene function prediction. RNA-sequencing based expression profiling of R2R3-MYB genes from temporal stages of internodes of sweet sorghum cultivar SSV84 revealed contrasting roles of genes belonging to the same clade in phenylpropanoid biosynthesis and sugar signalling.

MATERIALS AND METHODS

Identification of R2R3 MYB Proteins

The Hidden Markov Model (HMM) profile of MYB DNA-binding domain (PF00249), obtained from Pfam v31.0 [29] (http://pfam.xfam.org/family/PF00249), was queried against the sorghum proteome available at Phytozome v3.1 [30] (https://phytozome.jgi.doe.gov/pz/portal.html). The hits were filtered using E-value cut off of 1.0 and the presence of MYB domain in all the proteins was confirmed using NCBI-Conserved Domain Database [31] (CDD; https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) as well as SMART domain search tool [32] (http://smart.embl-heidel berg.de/). The R2R3 type MYB proteins were segregated based on the number of MYB repeats.

Chromosomal Localization and Duplication Analysis

The chromosomal coordinates of all R2R3-MYB proteins were extracted from phytozome and mapped onto sorghum chromosomes using Mapchart v2.30 [33] (https://www.wur.nl/en/show/Mapchart-2.30.htm). Tandemly duplicated genes were identified using Plant Tandem Duplicated Genes Database (PTGBase) [34], whereas, information about segmental duplications of sorghum MYB genes was extracted from Plant Genome Duplication Database (PGDD) [35]. The segmental duplications were illustrated using Circos v0.69-5 (http://circos.ca/; [36]). The subcellular localization of all proteins was checked using five different online available tools including Plant-mPLoc [37] (http://www.csbio.sjtu.edu.cn/bioinf/plant-multi/), MultiLoc2 [38] (https://abi-services.informatik.uni-tuebingen.de/multiloc2/webloc.cgi), Cello v 2.5 [39] (http://cello.life.nctu.edu.tw/), DeepLoc-1.0 [40] (http://www.cbs.dtu.dk/services/DeepLoc/cite.php) and WoLF PSORT [41] (https://wolfpsort.hgc.jp/).

Identification of Orthologs and Synteny with Other Genomes

The orthologous MYB proteins from other genomes were identified using Inparanoid v4.1 [42] with 100% bootstrap confidence (http://inparanoid.sbc.su.se/cgi-bin/index.cgi). For identifying orthologous genes on syntenic regions between maize, rice, Arabidopsis, Brachypodium and sorghum, chromosomal coordinates of all R2R3-MYB proteins were mapped onto PGDD [35] (http://chibba.agtec.uga.edu/ duplication/) and visualized using Circos v0.69-5 [36] (http://circos.ca/).

Structural Analysis

The protein sequences of all R2R3-MYB proteins were aligned using MAFFT v7.1 [43] with the iterative refinement method (1000 cycles; https://mafft.cbrc.jp/alignment/software/). Sequence features of the MYB domains were analysed by generating the weblogo of the aligned MYB domain using Weblogo 3 [44] (http://weblogo.threeplus one.com/create.cgi). Molecular weights and theoretical isoelectric point (pI) values of all the proteins were calculated using the ProtParam program at the Expasy bioinformatics resource portal [45] (https://web.expasy.org/protparam/). The information about exon-intron distribution was extracted from Phytozome and plotted using the Gene Structure Display Server [46] (http://gsds.cbi.pku.edu.cn/). The conserved motifs in R2R3-MYB proteins were identified using MEME search tool [47] (http://meme-suite.org/tools/meme) using the following parameters: motif discovery mode: classic; site distribution: zoops (zero or one occurrence per sequence); the number of motifs: 15; motif length: 6-50 amino acids. All 15 MEME motifs were scanned in TOMTOM (http://meme-suite.org/tools/tomtom) using all plant motif matrix profiles of JASPAR (http://jaspar.genereg.net/) with plant PFMs (position frequency matrices) as a target for similarity with any known motifs. Promoter sequences (2000 bp upstream region) of all SbMYB genes were extracted using the BioMart tool of Phytozome [30]. Sequences were submitted to the PlantCARE website (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/) to identify the cis-regulatory elements. The element matrix was visualized using Gephi, an open graph vis platform [48].

Phylogenetic Analysis of MYB Proteins

The multiple alignment of R2R3-MYB proteins was performed using standalone MAFFT v7.407 [43] (https://mafft.cbrc.jp/alignment/software/). Gap opening and gap extension penalty were set to 2.3 and 0.63, respectively. Alignment results, obtained above, were further checked and manually corrected using Jalview [49]. The corrected alignment was imported in MEGA (version 7.0) [50] to generate phylogenetic trees using neighbour-joining (NJ) and Maximum Likelihood (ML) methods with the Poisson model and pairwise deletion of gaps. The bootstrap analysis was performed with 1,000 replicates. For phylogenomic analysis of sorghum R2R3-MYB proteins with the genetically characterized R2R3-MYB proteins from other plant species, information about characterized MYB proteins was manually extracted from literature and their protein sequences were retrieved from Phytozome or NCBI sequence databases. A combined phylogenetic tree of sorghum R2R3-MYB proteins with previously characterized R2R3-MYB proteins from other plant species was generated using MEGA7. For the visualization of phylogenetic tree, Interactive Tree of Life (iTOL) server was used [51] (https://itol.embl.de/).

Expression Analysis Using Publicly Available Data

To analyse the expression profiles of MYB genes in different developmental tissues, publicly available RNA-seq data from different stages of vegetative and reproductive development, and four different stages of stem internodes were extracted from previously published studies [52, 53]. To visualize expression patterns in a phylogenetic context, expression data were plotted onto the phylogenetic tree using the Interactive Tree of Life (iTOL) server [51] (https://itol.embl.de/).

Brix Content

To measure brix content, five biological replicates of middle internodes were collected from field-grown Sorghum bicolor cultivar SSV84 plants at three distinct stages of development corresponding to booting, milky to soft dough and physiological maturity. The juice was extracted by squeezing internodes and brix content was determined through a digital hand saccharometer/refractometer (Atagotype) [54]. Three biological readings were subjected to ANOVA (Analysis Of Variance) using ToolPak analysis in Microsoft Excel.

RNA Sequencing and Data Analysis

To analyse the expression patterns of R2R3-MYB genes during sugar accumulation in sweet sorghum, middle internodes from field-grown sorghum plants (variety SSV84) were harvested at three distinct developmental stages namely, booting, milky to soft dough and physiological maturity. RNA was extracted using TRIzol reagent and assessed using Agilent 2100 Bioanalyzer [55]. High-quality RNA (RIN >8), extracted from two biological replicates of each stage, was used for the preparation of sequencing libraries using TruSeq® Stranded Total RNA kit as per manufacturer’s instructions followed by sequencing using Illumina HiSeq 2000 paired-end sequencing platform with an average read length of 100 bp. The low-quality reads (Phred score <30), adaptors and ribosomal RNA were removed using AdapterRemoval version 2.2.0 (https://adapterremoval.readthedocs.io/en/latest/) and Bowtie version 2.2.9, respectively. High-quality reads were aligned to the sorghum genome using the Hisat2 program (https://ccb.jhu.edu/software/hisat2/index.shtml) [56]. The expression values of all genes were estimated using StringTie version 1.3.3b (https://ccb.jhu.edu/software/stringtie/). The expression data of all MYB genes of sorghum was subjected to K-means clustering using MultiExperiment Viewer [57].

RESULTS AND DISCUSSION

Sorghum Genome Encodes 134 R2R3-MYB Genes with Evolutionarily Conserved Structural Features

MYB family transcription factors are characterized by a highly conserved helix-turn-helix DNA binding (MYB) domain, usually at the N-terminus of the proteins. MYB domain in R2R3-MYB proteins consists of two consecutive imperfect repeats of 50-53 amino acids. Each repeat forms three alpha-helices with the first two helices forming helix-turn-helix structure while the third recognition helix is involved in DNA binding [9, 58]. After removing redundancy and confirming the presence of two MYB repeats, we annotated 134 R2R3-MYB proteins from sorghum (Fig. ; Supplementary Table ). Based on the chromosomal coordinates, provided in Phytozome, we named sorghum R2R3-MYB genes from SbMYB001 to SbMYB134. In most of the proteins, R2 and R3 domains were present in tandem, as expected. Conversely, thirteen MYB proteins contained long linker regions between two repeats that ranged from 35 to 345 amino acids (Fig. ). The average length of these proteins was around 356 aa with SbMYB070 encoding the smallest protein of 174 aa and the SbMYB119 coding for the longest protein of 1562 aa (Supplementary Table ). Similarly, the average molecular weight of sorghum MYB proteins was estimated to be 38.7 kDa with an isoelectric point (pI) ranging from 4.57 to 10.68 (Supplementary Table ). Analysis of subcellular localization using five different online prediction tools suggested nuclear localization for all the R2R3-MYB proteins conforming to their role in transcriptional regulation (Supplementary Table ). The number of R2R3-MYB genes accounts for 0.392% of the total genes annotated in the sorghum genome. This is similar to the proportion of R2R3-MYB genes annotated in rice with 88 genes (0.393%), Brachypodium with 85 genes (0.389%) and maize with 157 genes (0.387%) though the proportion of MYB genes in Arabidopsis is higher (0.59%) with 138 genes [59, 60] (Fig. ). To compare the orthologous relationships among MYB proteins in these species, we identified orthologs of sorghum R2R3-MYB genes in all four genomes. The results were in agreement with the estimated evolutionary distance among these species [61] (Fig. ). For instance, maize, estimated to have diverged about 25 million years ago from sorghum, had the maximum number (117) of orthologs of sorghum genes (Fig. and ). Whereas, rice and Brachypodium, estimated to have diverged about 70 million years ago from sorghum clade, had 107 orthologs each (Fig. , , and ). As expected, dicot plant Arabidopsis, with an estimated divergence time of more than 150 million years, had orthologs for only 51 sorghum R2R3-MYB genes (Fig. and ; Supplementary Table ). Further, to examine the conserved sequence features of R2 and R3 MYB repeats, we performed multiple alignments and generated separate sequence logos for both R2 and R3 domains (Fig. ). As shown in their counterparts from other plant species, landmark tryptophan (W) residues (three in R2 and two in R3 domain), were conserved in both the repeats of sorghum proteins (Fig. ). These residues are involved in sequence-specific binding of DNA [62]. The first tryptophan in the R3 domain was replaced by another hydrophobic residue mostly phenylalanine (Fig. ). A cysteine residue in the third helix of the R2 domain was also conserved (Fig. ). Cysteine at this position in a maize protein is essential for DNA binding [58]. Several other residues such as glutamic acid (E)-11, aspartic acid (D)-12, leucine (L)-15, glycine (G)-23, leucine (L)-36 and leucine (L)-51 were also conserved in R2 repeat. Whereas in R3 repeat, proline (P)-2, glutamic acid (E)-13, isoleucine (I)-31, arginine (R)-8, asparagine (N)-41 and asparagine (N)-45 were also conserved. It would be interesting to explore the functional relevance of these conserved amino acids. In addition to conserved motifs within the proteins, cis-regulatory elements may further strengthen the functional categorization based on their phylogenetic placement. Towards this, MYB gene promoter sequences (2000 bp) were evaluated for the presence of various cis-regulatory elements (Supplementary Table ). Promoter elements, identified from regulatory regions of MYB genes, were characterized into four major categories i.e. growth and development, biosynthetic regulation, biotic stress and abiotic stress (Supplementary Fig. ). More than 50% of the total cis-regulatory elements were associated with growth and development, which is in consensus with the predicted roles of R2R3 MYB proteins in regulating plant growth and development [20]. Further, among the growth and development-related elements, light-responsive elements were present in almost all the promoters with the highest frequency compared to other categories (Supplementary Table ).

Analysis of Evolutionary Relationships between Sorghum R2R3-MYB Proteins

To investigate the evolutionary relationships among sorghum MYB proteins, we aligned complete protein sequences of R2R3-MYB proteins and constructed an unrooted phylogenetic tree using the NJ method (Fig. ) and ML methods (Supplementary Fig. ). Five proteins (MYB008, 017, 079, 086 and 119) with long linker regions between MYB repeats did not align and therefore, were not included in the phylogenetic analysis. Based on the sequence similarity and tree topology, R2R3-MYB proteins were divided into 14 groups numbered from I to XIV using Roman numerals (Fig. ). Two proteins, SbMYB115 and 070, that did not fall in any of the groups, are highlighted by letters a and b in the phylogenetic tree. The number of proteins in each group ranged from 2 to 24. The classification of MYB proteins was also supported by the structural features of these proteins. Although the number of introns varied from 0 to 13, 80% of the R2R3-MYB genes had one or two introns (Supplementary Table ). Further, we also noticed group-specific patterns such as all genes belonging to group IX, except one, lacked introns (Fig. ). On the contrary, all genes from group XII and XIV had three or more introns. A total of fifteen genes were predicted to encode multiple transcripts, while for rest 119 genes, only one transcript was annotated. Although the length of introns varied among genes belonging to the same group, group I genes had the smallest introns. The distribution of proteins into different clades mostly remained the same in the tree generated using the ML method (Supplementary Fig. ). The analysis of MEME motifs revealed similar motif distribution among members of the same group. A total of 15 motifs were identified from R2R3-MYB proteins of sorghum [63, 64] (Supplementary Table ). Eight of these motif (1-6, 9 & 10) represent the smaller conserved regions within the R2R3 MYB domains. Motif 1-5 were conserved in most of the MYB proteins except group XIII. Motif-6 was conserved in the MYB domain of proteins belonging to group I-VII and XIII, while, group IX and X members had motif-10 in place of motif-6 (Fig. ). In rest of the proteins, this region seems to have diverged (Fig. ). Members of group XIII and XIV showed rearrangement of the MEME motifs corresponding to the MYB domain. In some of the group XIII and XIV proteins, motif-9 corresponded to the part of R2 MYB domain while motif-11 corresponded to the R3 MYB domain of group XIII members. Closer investigation revealed that the linker region between R2 and R3 repeats of group XIII members is unusually long. Conservation of phylogenetic clade-specific MEME motifs in the regions of R2R3 MYB domains and variation in the linker region joining R2 and R3 domain suggest divergence of MYB domain architecture in sorghum MYB proteins and subsequent functions (Fig. ). Some motifs present in the highly variable C-terminal region of R2R3-MYB proteins may also contribute to the functional divergence of MYB genes by acting as transcriptional activators or repressors (Supplementary Table ). Further, some of them likely provide sites for post-translational modifications and physical interactions [65]. Motif-7 is only present in group VI members and is found close to the R3-MYB domain. It has conserved histidine, leucine, aspartic acid, serine, methionine, and arginine residues and has been detected in genes that specifically regulate JA-dependent transcriptional responses [66]. Further, MYB proteins having this motif align with characterized MYB genes implicated in abiotic stress (Fig. ). Motif-8 is specifically rich in glutamine (Q) residues and is present in members of group 1-6 (Supplementary Table ). Poly (Q) motifs have been found to stabilize the protein-protein interactions and are associated with protein aggregation in plants as well as animals [67, 68]. In humans, the presence of poly(Q) motifs has been associated with molecular pathogenesis responsible for several diseases [69]. Motif-12 is only present in group III members and has ExWLL/FDD motif (Supplementary Table ). It is located at the C-terminal end of the proteins. NCBI search of this motif showed that it is specifically associated with MYB or MYB-related proteins. When searched in the plant motif matrix profile of JASPAR for similarity with known motifs, results suggested that it may recognize GA-rich DNA motifs and regulate anthocyanin or flavanol content in plants (Supplementary Table ). Interestingly, some of the characterized MYB proteins that align with group III members are known to regulate anthocyanin pathway and stress tolerance. Motif-13 has a highly conserved “Phe-Leu-Gly-Val-Gly”, which has been shown to bind to DNA [70]. It is exclusively present in group X members and is present at the C-terminal end of the MYB proteins. Motifs-14 and 15 are single amino acid repeats of histidine and proline, respectively. Histidine rich motifs can interact with metal ions and have been shown to play diverse roles in DNA-protein interactions, protein conformation, nuclear targeting and transcriptional regulation [71-73]. Whereas, consecutive proline residues in the coding sequence have been shown to cause translational slow down through ribosome stalling. Recently, Gall and co-workers [74] described Mg2+-dependent control of translational speed as a metabolite sensor mechanism, which utilizes relatively slow translation of proline codons. Therefore, the presence of motifs that regulate the translation speed may be important for translational regulation of MYB proteins [75, 76].

Chromosomal Localization and Duplication Analysis Suggest a Key Role of Gene Duplications in the Expansion of R2R3-MYB Family in Sorghum

Based on the chromosomal coordinates, we localized sorghum R2R3-MYB genes on respective chromosomes (Fig. ). All 133 genes mapped onto sorghum chromosomes except one that could be traced on unanchored contigs. The R2R3-MYB genes showed an uneven distribution with the maximum number of genes (26) localized on chromosome 3, while, only five genes were located on chromosome 10 (Fig. ). Interestingly, most of the genes mapped on the terminal regions of chromosomes (Fig. ). Analysis of tandem arrays of identical sequences in close genomic proximity revealed six clusters of tandemly duplicated genes. Except for a cluster of three genes on chromosome 3, rest five clusters contained only two genes each (Fig. ). The percent identity between tandem duplicated genes varied from 33 to 67% (Supplementary Table ). Conversely, 28 pairs of duplicated genes that mapped to duplicated segments of the sorghum genome, exhibited 39 to 70% identity among duplicated gene pairs (Supplementary Table ). Except for chromosome 10, all chromosomes contained segmentally duplicated genes with the maximum number of genes mapped on syntenic regions between chromosomes 4 and 6 as well as chromosomes 3 and 9 (Fig. ). Mapping of duplicated genes on the phylogenetic tree revealed clade-specific expansion of R2R3-MYB genes. Group I with eight pairs of duplicated genes has the highest number of paralogous genes. Similarly, group VI with nine paralogous genes also seems to have undergone expansion due to segmental duplications. The non-synonymous/ synonymous substitution (Ka/Ks) ratio of all duplicated gene pairs was less than 1 suggesting that R2R3-MYB pro- teins have evolved under strong purifying selection (Supplementary Table ). The lower number of tandem duplicates compared to gene pairs on segmentally duplicated regions in R2R3-MYB proteins may also be attributed to positive selection driving higher retention of segmental duplicates [77]. The divergence time of duplicated gene pairs was estimated to vary between 30 to 100 million years ago.

Combined Phylogenetic Analysis with Experimentally Characterized Orthologs from Other Species Reveals Intra-clade Functional Diversity

Several R2R3-MYB genes have been experimentally characterized by different plant species. We thoroughly mined literature for experimentally characterized R2R3-MYB proteins and shortlisted 123 genes from different plant species (Supplementary Table ). Based on the major phenotype in overexpression, mutant or silencing lines, these were categorized into three major functional classes including plant development, biotic/abiotic stress response and biosynthetic regulation. A combined phylogenetic tree using 129 sorghum R2R3-MYB proteins and 123 characterized R2R3-MYB proteins from various plant species was generated (Fig. ). The topology of the tree was the same as that of the sorghum R2R3-MYB tree presented in Fig. () except for minor rearrangements. A group IV protein, SbMYB006 aligned with group VIII clade, whereas, SbMYB060 that aligned with SbMYB028 of group XII earlier, aligned with SbMYB070 (designated with b) in the combined tree. Except for groups VII, XIII and XIV, experimentally characterized R2R3-MYB proteins were dispersed throughout different phylogenetic groups (Fig. ). Group I had 34 characterized genes with seven of them implicated in plant growth and development, nine genes regulating biotic and abiotic stress response and 18 genes involved in biosynthetic processes. Similarly, 39 characterized genes clubbed with group II with about an equal number of genes implicated in stress response (17) and biosynthetic processes (16) and, six genes associated with plant development (Supplementary Table ). Group III contained six characterized genes, all of which are characterized in dicot plants and form a separate subclade within the group III. These determine abiotic stress response by regulating cuticular wax biosynthesis. Group IV had five characterized genes, three of which regulate plant development (Supplementary Table ). Similarly, group V contained eight characterized genes, four of which are involved in regulating plant development, while the remaining four are associated with biosynthetic processes associated with secondary cell wall biosynthesis and lignification (Supplementary Table ). Group VI had 14 characterized genes with eight of them associated with stress response while the rest five genes regulate male reproductive organ (anther) development. Group VIII had only one characterized gene involved in the specification of sperm cells. Group IX, on the other hand, had five characterized genes with four of them making a separate clade implicated in abiotic stress response. Group X had only two characterized genes with one of them involved in ovule development and the other gene associated with multiple stress responses. Furthermore, group XI contained two characterized genes involved in vegetative to embryonic transition during embryogenesis, whereas, three characterized genes of group XII determine drought tolerance by regulating stomatal development (Supplementary Table ). The representation of all three functions in most of the clades highlights intra-clade functional diversity in R2R3-MYB genes.

Role of R2R3-MYB Genes in Phenylpropanoid Biosynthesis

Due to high biomass content, lignocellulosic biomass of sorghum holds immense potential as biofuel feedstock [2]. However, recalcitrance due to the presence of lignin is a major constraint to enzymatic degradation of lignocellulosic biomass. Therefore, reducing the amount of lignin using biotechnological approaches is a major area of investigation in engineering lignocellulosic feedstock [78]. R2R3-MYB transcription factors have been shown to play a significant role in lignin biosynthesis and its deposition on plant secondary cell walls. While some of them have been shown to activate lignin biosynthesis, others act as negative regulators by directly binding to AC elements in the promoters of lignin biosynthetic genes [53, 79-83]. The suggested functions of R2R3-MYB genes in secondary cell wall development are conserved in both monocot and dicot lineages [84]. Such as, ectopic expression of ZmMYB31 and 42 resulted in reduced lignin content and improved saccharification efficiency in Arabidopsis [85]. To investigate the role of R2R3-MYB genes in secondary cell wall lignification in sorghum stems, we leveraged the RNA sequencing data available from four top internodes of bioenergy sorghum, R.07020, that represent sequential phases of development involving cell division, elongation and differentiation [53]. The first internode, being the youngest, exhibits active cell division while the fourth internode demonstrates cessation of growth and accumulation of secondary cell walls. Out of 123 R2R3-MYB genes, for which expression data in internodes is available, 31 exhibited the highest expression in the fourth internode thereby, suggesting a role in secondary cell wall biosynthesis (Supplementary Table ). However, eleven genes viz., SbMYB009, 037, 060, 086, 090, 092, 111, 115, 126, 131 and 132 exhibited higher expression in younger internodes that gradually declined in mature internodes suggesting their involvement in cell division. Interestingly most of the 31 genes with high expression in 4th internode, belong to groups I-VI. Whereas, genes showing higher expression in younger internodes are distributed across all phylogenetic groups with most of them having diversified R2 or R3 MYB domain (Fig. ). Three genes SbMYB019, 059 and 117 exhibited a similar level of expression in all four internodes, while the rest of the genes mostly exhibited variable or very low-level expression. By collating information of the characterized orthologs from other plant species, phylogenetic placement and expression data in internodes, we have predicted key targets for secondary cell wall biosynthesis in sorghum. Notable SbMYB025 and SbMYB101 of group I are syntelogs of ZmMYB31 and 42, which have already been shown to negatively regulate lignin biosynthesis in maize [85]. Similarly, three genes from one of the clades in group I, AtMYB20, AtMYB85, and PtMYB1 have earlier been shown to act as activators of monolignol biosynthesis [79, 86]. Also, sorghum genes in this group have a greater number of cis-regulatory elements associated with abiotic stress and biosynthetic regulation. Therefore, based on the expression patterns and sequence homology with these genes, we predict SbMYB024, 065, 095, 102 and 130 of group I as important candidates for engineering lignin biosynthesis. Further, SbMYB066 and 089 of group II are closely related to MYB63 and 58 of Arabidopsis, which have been reported to act as monolignol activators [79]. SbMYB066 in our study corresponds to a previously characterized gene, SbMYB60, that has been shown to affect both primary and secondary metabolism by regulating UDP-sugar and cellulose biosynthesis-related genes [87]. Also, biosynthetic regulation- and stress-associated cis-elements were detected in their promoters. Another gene, SbMYB112 of group V forms a separate clade with both PtMYB4 and AtMYB46 acting as monolignol activators [86, 88]. Overexpression of AtMYB46 resulted in the ectopic deposition of secondary walls around parenchymatous cells in Arabidopsis. In fact, Secondary wall-associated NAC Domain protein 1 (SND1), a regulator of the developmental program of secondary wall biosynthesis, binds to the MYB promoter and activates the biosynthetic pathways of secondary cell wall thickenings, suggesting that AtMYB46 is a master regulatory component of secondary wall biosynthesis in Arabidopsis. Hence, it would be interesting to study if SbMYB112 performs a similar function in sorghum. Similarly, SbMYB003 is orthologous to AtMYB52 which acts as a monolignol repressor [89]. These genes provide important candidates for engineering lignocellulosic biomass composition and thereby, saccharification efficiency in sorghum. In addition to lignin, R2R3-MYB genes also regulate the biosynthesis of a diverse array of flavonoids (anthocyanins, proanthocyanidins, flavonols, isoflavonoids, and phlobaphenes), phenolic acids and stilbenes [21, 90]. These are of significant interest from both commercial and human health perspectives due to their anti-inflammatory and chemoprotective properties [90]. A total of eighteen genes of group II, from diverse plant species, have been experimentally shown to regulate flavonoid biosynthesis (Fig. ). Interestingly, all of them act as activators. Therefore, sorghum genes belonging to group II including SbMYB007, 009, 043, 048, 063, 083 and 103 are putative candidates for engineering flavonoid biosynthesis in sorghum. Two motifs, TGACG and CGTCA, associated with biosynthetic regulation were also identified in their promoters. Among these, SbMYB103 shows high homology with Yellow seed 1 (Y1), which is essential for the synthesis of 3-deoxy flavonoid pigments in sorghum [91]. The loss of function of Y1 leads to yellow color due to the lack of visible phlobaphene in seed pericarp.

Role of Sorghum R2R3-MYB Genes in Vegetative and Reproductive Organ Development

Several R2R3-MYB genes have been shown to play a major role in regulating different aspects of plant growth and development ranging from female reproductive development [92], morphogenesis [93], branch number [94], sperm cell differentiation [95] and seed size [96]. To investigate the spatial and temporal expression patterns of sorghum R2R3-MYB genes, we used publicly available data from nine developmental tissues viz., leaves, pre-emergence inflorescence, post-emergence inflorescence, anthers, pistils, seeds collected 5 and 10 days after pollination, embryo and endosperm (Supplementary Table ). The expression data was mapped on the phylogenetic tree so that we could determine group-specific conservation in expression patterns (Fig. ). Although all the genes exhibited diverse expression patterns, a few clade-specific inferences could be made. For instance, genes belonging to groups X and XI exhibited low-level expression, whereas, most of the genes belonging to groups XII, XIII and XIV exhibited high-level ubiquitous expression. Genes belonging to groups IV and V exhibited preferential accumulation in inflorescence tissues. Some tissue-specific expression patterns were also distinct, such as SbMYB130 showed embryo and endosperm specific accumulation; SbMYB113 and SbMYB046 of group VI exhibited pistil-preferential expression, whereas, SbMYB133 of group VII exhibited the highest expression in anthers. Seed-specific cis-regulatory elements (RY-element, O2-site, GCN4_motif, HD-Zip 1, AACA_motif) were detected in the promoter region of SbMYB133. SbMYB125 and SbMY134 expressions was confined to endosperm tissue, while SbMYB060 exhibited maximum accumulation in embryos. Similarly, SbMYB066, 089 and 098 expression was specific to post-emergence inflorescence, while SbMYB061, 078, 081, 094 and 128 exhibited preferential expression in the pre-emergence inflorescence. Many of the tissue-specific genes have also been implicated in the biosynthesis of phenylpropanoids or related compounds indicating that they may be regulating plant development by fine-tuning secondary metabolites.

Contrasting Roles of Sorghum MYB Proteins in Phenylpropanoid and/or Sugar Signalling

Although R1/2-type MYB family gene of rice, OsMYBS1 and its ortholog in Arabidopsis have been demonstrated to play role in sugar signalling, the role of R2R3-MYB genes in sugar signalling remains unexplored [97]. Some MYB genes have been shown to regulate phenylpropanoid biosynthesis through sugar signalling. For example, loss of function of PAP1/MYB71 gene function in Arabidopsis impairs sugar-dependent inducibility of dihydroflavonol reductase, a key enzyme in anthocyanin biosynthesis [98, 99]. To examine the role of R2R3-MYB genes in sugar accumulation and/or signalling, we performed transcriptome profiling in internodes collected from sweet sorghum cultivar, SSV84. The brix assay with internodes revealed that the brix content increases from booting to the milky stage with a peak at physiological maturity [100] (Fig. and ). Two biological replicates of middle internodes were collected at booting, milky to a soft dough and physiological maturity stages and used for RNA sequencing. More than 500 million reads were generated out of which about 95% mapped to the sorghum genome (Supplementary Table ). Out of 134 R2R3-MYB genes, annotated in this study, 65 exhibited FPKM >1 in sorghum internodes (Supplementary Table ). Based on their expression patterns, they could be divided into nine distinct clusters (Fig. ). Cluster 1 comprised genes exhibiting minimal variability in expression among all three stages. Cluster 2 contained nine genes whose expression gradually increased from booting to physiological maturity. Two of these genes fall in group V of the phylogenetic tree. Cluster 3 contained nine genes with the highest expression at booting stage that gradually declined in subsequent stages. Cluster 4, 5, 6 and 7 comprised one gene each with each of them exhibiting distinct expression patterns, while cluster 6 exhibited the highest expression in second and third internode stages. Both clusters 5 and 6 genes exhibited the highest expression in the last stage of the internode, whereas, clusters 4 and 7 genes exhibited highest expression in the booting stage (Fig. ). Clusters 8 and 9 contained the maximum number of genes with 22 genes exhibiting cluster 8 pattern and 21 genes exhibiting cluster 9 pattern (Fig. ). Cluster 8 genes showed the highest expression in internodes collected from milky to soft dough stage, whereas, cluster 9 genes exhibited contrasting pattern with peak expression in internodes collected from booting and physiological maturity stages. Phylogenetic placement of the genes belonging to clusters 8 and 9 revealed a remarkably interesting pattern. Although it was common for different genes from the same clade to exhibit cluster 8 and cluster 9 pattern, most of the sorghum genes exhibiting cluster 8 pattern grouped with experimentally characterized genes known to act as activators of phenylpropanoid biosynthesis, whereas, those exhibiting cluster 9 pattern clubbed with known repressors. These results suggest that the genes belonging to clusters 8 and 9 play contrasting roles in primary or/and secondary metabolic pathways. Such opposite regulatory effects of R2R3-MYB genes have earlier been demonstrated in white spruce and Arabidopsis, proposed as a fine-tuning mechanism for regulation of plant metabolites [101].

CONCLUSION

R2R3-MYB transcription factors (TFs) form one of the largest in number and functionally heterogeneous transcription factor family in plants. They have been implicated in secondary cell wall development by regulating the expression of genes involved in the synthesis of cell wall components and phenylpropanoids [102]. The specialized secondary compounds regulated by R2R3-MYB genes are important to confer abiotic stress tolerance and resistance to pathogens and herbivores [103]. Using bioinformatic analyses coupled with in-depth expression profiling, we have generated a framework for hypothesizing R2R3-MYB gene functions in sorghum. Our results revealed selective expansion of specific clades attributed to gene duplications. Genes belonging to the same clade inside the major clades showed diverse functionalities indicating intra-clade functional diversification. While tissue-specific enrichment of R2R3-MYB sorghum genes provide candidates for engineering plant development, contrasting patterns in internodes of bioenergy and sweet sorghum provide targets for fine-tuning phenylpropanoid biosynthesis and, conditioning biofuel syndrome in sorghum cultivars.
  95 in total

1.  DNA binding by the substrate specificity (wedge) domain of RecG helicase suggests a role in processivity.

Authors:  Geoffrey S Briggs; Akeel A Mahdi; Qin Wen; Robert G Lloyd
Journal:  J Biol Chem       Date:  2005-02-03       Impact factor: 5.157

Review 2.  Conservation and diversification of three-repeat Myb transcription factors in plants.

Authors:  Masaki Ito
Journal:  J Plant Res       Date:  2005-02-10       Impact factor: 2.629

3.  Prediction of protein subcellular localization.

Authors:  Chin-Sheng Yu; Yu-Ching Chen; Chih-Hao Lu; Jenn-Kang Hwang
Journal:  Proteins       Date:  2006-08-15

Review 4.  MYB transcription factor genes as regulators for plant responses: an overview.

Authors:  Supriya Ambawat; Poonam Sharma; Neelam R Yadav; Ram C Yadav
Journal:  Physiol Mol Biol Plants       Date:  2013-07

5.  Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown.

Authors:  Mihaela Pertea; Daehwan Kim; Geo M Pertea; Jeffrey T Leek; Steven L Salzberg
Journal:  Nat Protoc       Date:  2016-08-11       Impact factor: 13.491

6.  The Sorghum bicolor genome and the diversification of grasses.

Authors:  Andrew H Paterson; John E Bowers; Rémy Bruggmann; Inna Dubchak; Jane Grimwood; Heidrun Gundlach; Georg Haberer; Uffe Hellsten; Therese Mitros; Alexander Poliakov; Jeremy Schmutz; Manuel Spannagl; Haibao Tang; Xiyin Wang; Thomas Wicker; Arvind K Bharti; Jarrod Chapman; F Alex Feltus; Udo Gowik; Igor V Grigoriev; Eric Lyons; Christopher A Maher; Mihaela Martis; Apurva Narechania; Robert P Otillar; Bryan W Penning; Asaf A Salamov; Yu Wang; Lifang Zhang; Nicholas C Carpita; Michael Freeling; Alan R Gingle; C Thomas Hash; Beat Keller; Patricia Klein; Stephen Kresovich; Maureen C McCann; Ray Ming; Daniel G Peterson; Doreen Ware; Peter Westhoff; Klaus F X Mayer; Joachim Messing; Daniel S Rokhsar
Journal:  Nature       Date:  2009-01-29       Impact factor: 49.962

Review 7.  Emerging strategies of lignin engineering and degradation for cellulosic biofuel production.

Authors:  Jing-Ke Weng; Xu Li; Nicholas D Bonawitz; Clint Chapple
Journal:  Curr Opin Biotechnol       Date:  2008-04-09       Impact factor: 9.740

8.  Characterisation of Pt MYB1, an R2R3-MYB from pine xylem.

Authors:  Astrid Patzlaff; Lisa J Newman; Christian Dubos; Ross W Whetten; Caroline Smith; Stephanie McInnis; Michael W Bevan; Ronald R Sederoff; Malcolm M Campbell
Journal:  Plant Mol Biol       Date:  2003-11       Impact factor: 4.076

9.  Evolution of the 3R-MYB Gene Family in Plants.

Authors:  Guanqiao Feng; John Gordon Burleigh; Edward L Braun; Wenbin Mei; William Bradley Barbazuk
Journal:  Genome Biol Evol       Date:  2017-04-01       Impact factor: 3.416

Review 10.  Sweet sorghum as biofuel feedstock: recent advances and available resources.

Authors:  Supriya Mathur; A V Umakanth; V A Tonapi; Rita Sharma; Manoj K Sharma
Journal:  Biotechnol Biofuels       Date:  2017-06-08       Impact factor: 6.040

View more
  1 in total

1.  R2R3-MYBs in Durum Wheat: Genome-Wide Identification, Poaceae-Specific Clusters, Expression, and Regulatory Dynamics Under Abiotic Stresses.

Authors:  Emanuela Blanco; Pasquale Luca Curci; Andrea Manconi; Adele Sarli; Diana Lucia Zuluaga; Gabriella Sonnante
Journal:  Front Plant Sci       Date:  2022-06-20       Impact factor: 6.627

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.