Literature DB >> 35021222

A Collaborative Classroom Investigation of the Evolution of SABATH Methyltransferase Substrate Preference Shifts over 120 My of Flowering Plant History.

Nicole M Dubs1, Breck R Davis1, Victor de Brito1, Kate C Colebrook1, Ian J Tiefel1, Madison B Nakayama1, Ruiqi Huang1, Audrey E Ledvina1, Samantha J Hack1, Brent Inkelaar1, Talline R Martins1, Sarah M Aartila1, Kelli S Albritton1, Sarah Almuhanna1, Ryan J Arnoldi1, Clara K Austin1, Amber C Battle1, Gregory R Begeman1, Caitlin M Bickings1, Jonathon T Bradfield1, Eric C Branch1, Eric P Conti1, Breana Cooley1, Nicole M Dotson1, Cheyone J Evans1, Amber S Fries1, Ivan G Gilbert1, Weston D Hillier1, Pornkamol Huang1, Kaitlin W Hyde1, Filip Jevtovic1, Mark C Johnson1, Julie L Keeler1, Albert Lam1, Kyle M Leach1, Jeremy D Livsey1, Jonathan T Lo1, Kevin R Loney1, Nich W Martin1, Amber S Mazahem1, Aurora N Mokris1, Destiny M Nichols1, Ruchi Ojha1, Nnanna N Okorafor1, Joshua R Paris1, Thais Fuscaldi Reboucas1, Pedro Beretta Sant'Anna1, Mathew R Seitz1, Nathan R Seymour1, Lila K Slaski1, Stephen O Stemaly1, Benjamin R Ulrich1, Emile N Van Meter1, Meghan L Young1, Todd J Barkman1.   

Abstract

Next-generation sequencing has resulted in an explosion of available data, much of which remains unstudied in terms of biochemical function; yet, experimental characterization of these sequences has the potential to provide unprecedented insight into the evolution of enzyme activity. One way to make inroads into the experimental study of the voluminous data available is to engage students by integrating teaching and research in a college classroom such that eventually hundreds or thousands of enzymes may be characterized. In this study, we capitalize on this potential to focus on SABATH methyltransferase enzymes that have been shown to methylate the important plant hormone, salicylic acid (SA), to form methyl salicylate. We analyze data from 76 enzymes of flowering plant species in 23 orders and 41 families to investigate how widely conserved substrate preference is for SA methyltransferase orthologs. We find a high degree of conservation of substrate preference for SA over the structurally similar metabolite, benzoic acid, with recent switches that appear to be associated with gene duplication and at least three cases of functional compensation by paralogous enzymes. The presence of Met in active site position 150 is a useful predictor of SA methylation preference in SABATH methyltransferases but enzymes with other residues in the homologous position show the same substrate preference. Although our dense and systematic sampling of SABATH enzymes across angiosperms has revealed novel insights, this is merely the "tip of the iceberg" since thousands of sequences remain uncharacterized in this enzyme family alone.
© The Author(s) 2022. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  enzyme evolution; paralog compensation; plant methyltransferase evolution; substrate preference evolution

Mesh:

Substances:

Year:  2022        PMID: 35021222      PMCID: PMC8890502          DOI: 10.1093/molbev/msac007

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


Introduction

Studies of enzyme function have historically been based on model species which are limited to only a few lineages throughout the tree of life. This approach has yielded many important insights and formed a firm foundation upon which to understand variation and evolution of novelty. Yet, ideally, enzyme activity would be determined from throughout the tree of life such that comprehensive understanding of structure–function relationships and the evolutionary mechanisms of functional change would emerge (Mandoli and Olmstead 2000; Chang et al. 2016). Indeed, comparative analyses of RUBISCO from over 100 distinct enzymes have helped to illuminate the nature of constraint on substrate discrimination (Bouvier et al. 2021). Although comprehensive studies were impractical 20 years ago, the genomic revolution has allowed for determination of sequences from nearly every major lineage of organisms. For example, the OneKP project generated transcriptomic data from more than 1,000 species of green plants (Leebens-Mack et al. 2019), yet most remain uncharacterized in terms of function. Such rich data sets provide unprecedented opportunities to experimentally study protein functional change among the many lineages that have colonized earth in a diversity of habitats. To begin to utilize the massive data now available, we have taken an approach that integrates learning with discovery by developing a lab-based class to test evolutionary biochemical hypotheses. In this case, genes were cloned from previously unstudied plants and heterologous protein expression was used to test predictions related to enzyme substrate preference shifts. The class was designed to improve understanding of general evolutionary principles, promote phylogenetic perspectives to understand protein functional change, develop expertise with essential lab skills, and provide a research experience with unknown outcomes (see Materials and Methods for additional details). The data set and discoveries we present were largely generated from these classroom research projects. Plants are well-known for their ability to produce a wide array of specialized metabolites including terpenoids, flavonoids, alkaloids, and phenylpropanoids that appear to have important and varied roles, particularly for interactions with the environment (Pichersky and Lewinsohn 2011). Methylation is a particularly common enzymatically catalyzed substitution in the biosynthetic pathways of these varied metabolites and often involves enzymes of the salicylic acid/benzoic acid/theobromine (SABATH) methyltransferase (MT) family (Noel et al. 2003). Numerous SABATH enzymes have been functionally characterized including those that methylate a wide array of carboxylic acids (Ross et al. 1999; Kolosova et al. 2001; Seo et al. 2001; Yang et al. 2006; Kapteyn et al. 2007; Varbanova et al. 2007; Murata et al. 2008; Hippauf et al. 2010; Petronikolou et al. 2018; Wu et al. 2018) as well as thiols and ring nitrogen atoms (Kato et al. 1996; Uefuji et al. 2003; McCarthy and McCarthy 2007; Zhao et al. 2012; Huang et al. 2016). The enzyme family appears particularly diverse in angiosperms with approximately 25–40 members in Arabidopsis and Oryza, whereas mosses and liverworts appear to have only approximately four to nine enzymes encoded in their genomes (D’Auria et al. 2003; Zhao et al. 2008, 2012; Zhang et al. 2019). The most widely studied SABATH enzyme is salicylic acid carboxyl MT (SAMT) which produces methyl salicylate (MeSA) by methylation of salicylic acid (SA) (Ross et al. 1999) (fig. 1). MeSA has been identified as a component of floral scent and fruit flavor, is emitted after insect herbivory and acts as a vital signaling molecule in plant immune response (Knudsen et al. 2006; Park et al. 2007; Tieman et al. 2010). SAMT was first identified and characterized as a fragrance-producing enzyme in flowers of Clarkia breweri (Onagraceae) (Ross et al. 1999). Orthologous SAMT enzymes have since been identified in Plantaginaceae (Scrophulariaceae), Apocynaceae, Solanaceae, Fabaceae, Theaceae, and Salicaceae and variation in terms of substrate preference for structurally related molecules has been shown (Negre et al. 2002; Pott et al. 2004; Effmert et al. 2005; Hippauf et al. 2010; Lin et al. 2013; Han et al. 2018). Whereas some SAMTs are very specific in terms of affinity for SA, several studies have demonstrated activity with the structurally similar compound, benzoic acid (BA) (fig. 1), as well as numerous other molecules (Pott et al. 2004; Hippauf et al. 2010). SAMT appears to be known only from angiosperms; however, a recent study of the SABATH family in the gymnosperm genus, Picea, and the liverwort genus, Conocephalum, identified enzymes that prefer to methylate SA (Chaiprasongsuk et al. 2018; Zhang et al. 2019). Since neither of these enzymes appears to be closely related to SAMT in angiosperms, it is difficult to predict substrate preference from phylogenetic placement within the SABATH family tree, or primary amino acid sequence, alone.
Fig. 1.

Representative structures recognized as substrates by the SABATH family of MT enzymes. Most characterized family members are carboxyl MTs and usually exhibit a high degree of preference for one substrate.

Representative structures recognized as substrates by the SABATH family of MT enzymes. Most characterized family members are carboxyl MTs and usually exhibit a high degree of preference for one substrate. Given the critical role for MeSA in pathogen and herbivore defense documented from model plants, it would be expected that species lacking an SAMT to produce the metabolite would be at a selective disadvantage. Yet, MeSA is only known to be produced from a few handfuls of species that are distantly related (Knudsen et al. 2006). Therefore, it remains to be determined if SAMT orthologs are found throughout the land plant lineage and whether or not specialized enzymatic activity for MeSA production has been maintained over time or has evolved multiple times independently by convergence. SAMT orthologs appear to be absent from the genomes of two model organisms, Arabidopsis thaliana and Oryza sativa (Chen et al. 2003; Zhao et al. 2010). In those species, benzoic/salicylic acid MT (BSMT), which may be paralogous to SAMT, is capable of BA as well as SA methylation, consistent with the evolutionary mechanism of paralog functional compensation (Hanada et al. 2011; Diss et al. 2014). This makes the extent to which the function of any SABATH family member would be conserved and predictably involved in the methylation of SA unclear. Crystallographic characterization of SAMT from C. breweri yielded insight into the active site residues that bind the methyl donor, S-adenosylmethionine, as well as those that serve to orient SA within the binding pocket to allow for methyl transfer (Zubieta et al. 2003). Mutagenesis of one of these sites, Met150, in SAMT from Datura wrightii back to the putatively ancestral His showed an approximately 10-fold reduction of preference for SA over BA, thereby implicating this residue in substrate discrimination (Barkman et al. 2007). Subsequent studies used ancestral sequence resurrection and mutagenesis to show the evolution of an early preference shift toward SA was governed by a change from His150 to Met early in the history of the SAMT enzyme lineage (Huang et al. 2012). Additional studies have mutated Met150 to His (or vice versa) in several modern-day sequences and shown altered preference for SA (Zubieta et al. 2003; Han et al. 2018). Thus, this active site residue may be a robust predictor of substrate preference but it remains unclear how widespread it is since so few angiosperms have been investigated. Although the functional effects of amino acid substitutions on the evolution of protein function have been studied in SAMT and other plant enzymes for a few focal species (Kaltenegger et al. 2013; Smith et al. 2013), comparative approaches conducted across numerous diverse flowering plant lineages may reveal important variation and increase power for statistical analyses. A comprehensive gene family history estimate coupled with functional characterization of enzymes has the potential to answer many long-standing questions regarding the evolution of SABATH activity in land plants. Specifically, our objectives in this study were to investigate: 1) how widely SAMT orthologs are found across land plants and to what extent gene duplication and/or loss occurred, 2) how enzyme preference for SA over BA has evolved over 120 My of angiosperm phylogeny in both SAMT orthologs as well as functionally compensating paralogs, and 3) whether Met150 is an accurate predictor of substrate preference throughout the SABATH enzyme family. To do this, we performed phylogenetic analyses of approximately 1,500 SABATH sequences and experimentally assayed enzyme function for more than 70 species from throughout the angiosperm tree of life in a college laboratory classroom setting to illustrate that students can meaningfully contribute to the generation of functional data to study protein evolution.

Results and Discussion

Angiosperm SABATH Enzymes That Methylate SA and/or BA Belong to Three Lineages: SAMT, BAMT, BSMT, and Xanthine Alkaloid MT

A maximum likelihood analysis of 1,578 SABATH sequences based on data from 62 complete genomes, in addition to functionally characterized cDNA sequences from select members of various clades, resulted in a single tree that was rooted with green algae sequences (log likelihood −328269.41; fig. 2). Overall, branch support is low among the first diverging lineages making it difficult to draw confident conclusions about the most ancient branching patterns in the multigene family; however, there are several observations that are clear regarding the origins of SA- and BA-methylating enzymes. First, all known angiosperm enzymes that methylate SA and BA appear to be part of a well-supported clade (bootstrap proportion [BP] = 100) that is diversified in terms of substrate preference (fig. 2). Within this large group of enzymes is the SAMT lineage (BP=99), for which, preference for SA has been shown (Ross et al. 1999). SAMT is closely related to a clade of enzymes that includes benzoic acid MT (BAMT), first studied in Antirrhinum and BSMT from Arabidopsis (BP=98) which has activity with both BA and SA (Murfitt et al. 2000; Chen et al. 2003). Sister to the BAMT/BSMT lineage (BP<50), is the monocot BSMT clade (BP=100), within which, enzymes with preference for benzoic acid, anthranilic acid, and SA have been reported (Kollner et al. 2010; Zhao et al. 2010). Also apparently related to SAMT and BAMT/BSMT is the xanthine alkaloid MT (XMT) clade (BP=100) (fig. 2). This lineage includes enzymes that produce theobromine and caffeine in Citrus and Coffea as well as one from Mangifera that has high activity with BA, but not SA (Kato and Mizuno 2004; Huang et al. 2016). Second, the Picea SABATH enzyme that has been shown to have preference for SA (Chaiprasongsuk et al. 2018) is part of a clade of gymnosperm sequences (BP=100) sister to diverse flowering plant enzymes that include SAMT (fig. 2). Third, the Conocephalum MT that shows activity with SA (Zhang et al. 2019) is part of a lineage of liverwort sequences (BP=91) that likely diverged early in the history of the SABATH enzyme family but is not closely related to angiosperm SAMT (fig. 2). Whether these nonangiosperm enzymes represent cases of convergent gains of SA substrate preference or indicate that SA methylation is ancestral for land plants remains to be determined.
Fig. 2.

A phylogenetic analysis of 1,578 SABATH protein MT sequences shows overall enzyme family relationships. Enzyme names shown in bold have been demonstrated to methylate SA and/or benzoic acid (BA). Angiosperm enzymes that methylate SA and/or BA are found in the SAMT, BAMT, BSMT, and XMT clades. The Gymnosperm and Liverwort SAMT enzymes that methylate SA are not closely related to those of angiosperms. Bootstrap support is shown for selected nodes that separate the SA and BA-methylating clades of enzymes from others. SAMT, salicylic acid MT; BAMT, benzoic acid MT; BSMT, benzoic/salicylic acid MT; JMT, jasmonic acid MT; XMT, xanthine alkaloid MT; CS, caffeine synthase; IAMT, indole acetic acid MT; GA, Gibberellic acid MT; FAMT, farnesoic acid MT; LAMT, loganic acid MT.

A phylogenetic analysis of 1,578 SABATH protein MT sequences shows overall enzyme family relationships. Enzyme names shown in bold have been demonstrated to methylate SA and/or benzoic acid (BA). Angiosperm enzymes that methylate SA and/or BA are found in the SAMT, BAMT, BSMT, and XMT clades. The Gymnosperm and Liverwort SAMT enzymes that methylate SA are not closely related to those of angiosperms. Bootstrap support is shown for selected nodes that separate the SA and BA-methylating clades of enzymes from others. SAMT, salicylic acid MT; BAMT, benzoic acid MT; BSMT, benzoic/salicylic acid MT; JMT, jasmonic acid MT; XMT, xanthine alkaloid MT; CS, caffeine synthase; IAMT, indole acetic acid MT; GA, Gibberellic acid MT; FAMT, farnesoic acid MT; LAMT, loganic acid MT.

SAMT Is an Ancient Gene Lineage That Has Been Retained in Most Orders of Flowering Plants

A phylogenetic analysis of 629 translated genomic and transcriptomic SAMT sequences, including those from 76 experimentally characterized enzymes (see below), was performed using a maximum likelihood approach (log likelihood −93006.968; fig. 3). The estimated relationships among the SAMT proteins from 37 orders of flowering plants are largely as expected with “basal angiosperm” and “basal eudicot” orders estimated as early diverging and most eudicot sequences included in the large Rosid and Asterid clades (fig. 3). This widespread distribution and branching pattern of apparent SAMT orthologs suggests that the enzyme was present in early angiosperms and has been maintained over the approximately 120+ My history of flowering plants. Yet, at least three lineages are notable exceptions that appear to lack SAMT orthologs. First, no orthologous monocot SAMT sequences were obtained even though we queried genomic and transcriptomic data sets of numerous species. Thus, it would appear that SAMT was lost from the monocot lineage early during its divergence. Second, despite queries of multiple genome sequences and transcriptome data, no SAMT orthologs were recovered from Brassicaceae, the best-studied family of Brassicales in which the model organism A. thaliana belongs. However, an SAMT is present in Carica (papaya), an early branching member of Brassicales (family Caricaceae) (fig. 3), which indicates that the gene was lost later during the history of the order. Third, the absence of SAMT in complete genome sequences of three members of Rosaceae (Fragaria, Prunus, and Malus) suggests loss of the gene from the rose family because orthologs have been isolated from several other members of Rosales, the order to which it belongs (fig. 3). Lineage-specific gene deletion was also reported for GAMT (see fig. 2) which was shown to be lost numerous times throughout angiosperm history (Zhang et al. 2020). Of the 12 other orders/lineages that appear to be missing SAMT, genome sequences remain uncharacterized, making it difficult to discern whether they possess an ortholog or not. However, in the case of Geraniales, which will be discussed below, transcriptome data indicate that Geranium expresses numerous other SABATH sequences; yet, none are orthologous to SAMT.
Fig. 3.

A phylogenetic estimate of SAMT sequences shows that most orders of angiosperms possess orthologs except for monocots. Because SAMT has been retained in several basal angiosperms, basal eudicots, rosids and asterids, orthologs are predicted to have been present in ancestral angiosperms. Lineages marked in orange include at least one enzyme for which functional analyses have been performed as shown in figure 4. Nodes for orders supported by bootstrap proportions >90 are marked by filled blue circles.

A phylogenetic estimate of SAMT sequences shows that most orders of angiosperms possess orthologs except for monocots. Because SAMT has been retained in several basal angiosperms, basal eudicots, rosids and asterids, orthologs are predicted to have been present in ancestral angiosperms. Lineages marked in orange include at least one enzyme for which functional analyses have been performed as shown in figure 4. Nodes for orders supported by bootstrap proportions >90 are marked by filled blue circles.
Fig. 4.

A phylogenetic context for enzyme substrate preference in SAMT, BAMT, BSMT, and XMT enzymes. Horizontal bars show relative enzyme preference for SA and BA. Enzyme names marked with “*” are adapted from published literature. Phylogenetic relationships among enzymes are shown with branches colored according to estimated/observed active site amino acid residue. Although every enzyme with Met at position 150 prefers to methylate SA, those with His prefer BA or another substrate. Enzymes with Gln may show preference for one or the other substrate. Pie charts at selected nodes show probabilities for estimated ancestral amino acid state.

Although some orders appear to have lost SAMT orthologs, within others it is apparent that recent family- or genus-specific duplications/proliferations have occurred. For instance, SAMT appears to have duplicated early in Solanaceae family history since most sampled species appear to possess at least two divergent copies (fig. 3), as has been reported previously (Barkman et al. 2007). Likewise, Ipomoea (Convolvulaceae), also in Solanales, appears to have independently experienced duplication of SAMT since it expresses at least two SAMT-type sequences (fig. 3). Recent SAMT-specific duplications are also inferred from the estimated branching patterns in Ricinus (Euphorbiaceae) in Malpighiales (fig. 3). These particular duplications are highlighted because we report on their functional assays below. Quantifying the precise number of recent duplicated genes in each lineage is challenging given the combination of both genomic and transcriptomic sequences in our analysis so, although there appear to be others within various families (e.g., Fabaceae and Vitaceae), we do not describe them further.

Relative Enzyme Preference for SA over BA Appears to Be Anciently Conserved in Flowering Plant SAMT

Relative substrate preference for SA and BA has been characterized for 76 enzymes from the SAMT lineage that were isolated from 61 genera, 40 families, and 23 orders of angiosperms (fig. 4 and fig. 3, branches highlighted in orange). Fifty-six of these enzymes are newly reported (supplementary table S1, Supplementary Material online) and increase the number from six previously sampled families due to the upper division undergraduate/first-year graduate level, single-semester college course we implemented to study enzyme evolution. Previously unstudied plant species were chosen to clone genes from by extracting RNA from various tissues, using bioinformatics to design primers and performing reverse transcriptase-polymerase chain reaction (RT-PCR) for cDNA amplification (supplementary table S1, Supplementary Material online). Not all plant tissues yielded usable high-quality RNA or expressed the SAMT gene which necessitated switching to a different tissue/species when time permitted. In approximately 60% of cases, cDNA was obtained and successfully cloned and enzyme activity was detected from the heterologously expressed protein. A phylogenetic context for enzyme substrate preference in SAMT, BAMT, BSMT, and XMT enzymes. Horizontal bars show relative enzyme preference for SA and BA. Enzyme names marked with “*” are adapted from published literature. Phylogenetic relationships among enzymes are shown with branches colored according to estimated/observed active site amino acid residue. Although every enzyme with Met at position 150 prefers to methylate SA, those with His prefer BA or another substrate. Enzymes with Gln may show preference for one or the other substrate. Pie charts at selected nodes show probabilities for estimated ancestral amino acid state. In the majority of assays, SA is the preferred substrate across the flowering plant lineage (fig. 4). The fact that most SAMT enzymes have preference for SA, including those from the early diverging basal angiosperms, Persea, Sassafras, Calycanthus, Magnolia, and Annona, as well as those from more recently diverged Asterids and Rosids, strongly suggests that this enzymatic trait has been conserved for at least 125 My in angiosperms. Such conservation of function indicates an important role for MeSA production in flowering plants, as has been demonstrated in the few plants for which its biosynthesis has been experimentally altered (Park et al. 2007; Chen et al. 2019). Nonetheless, some enzymes within the SAMT lineage deviate and exhibit a preference for BA over SA including relatively recent duplicates isolated from Nicotiana (Hippauf et al. 2010), Ipomoea, and Ricinus (fig. 4). In these cases, each genus possesses one enzyme with substrate preference for SA and a second that shows higher preference for BA (fig. 4). Additionally, SAMT orthologs with preference for O-anisic acid or nicotinic acid (see fig. 1 for structures) have been reported in Nicotiana (Hippauf et al. 2010). Some degree of enzymatic neofunctionalization is not surprising in these cases given that one duplicate appears to be preserved for SA methylation thereby allowing others to evolve alternative substrate preferences. The only other exceptional SAMTs include one from Nelumbo that prefers to methylate BA over SA and one enzyme from Medicago which appears to have recently gained the preferential methylation of anthranilic acid over either BA or SA (Pollier et al. 2019). The genomes from both of these genera encode additional SAMT duplicates that remain uncharacterized.

SABATH Paralogs Appear to Functionally Compensate for Lack of SAMT Orthologs in at Least Three Angiosperm Lineages

Due to the apparent conserved preference for SA methylation across the angiosperm lineage, it is, perhaps, not surprising that loss of SAMT in specific lineages has been compensated for by SABATH paralogs. First, even though no SAMT ortholog is known from monocots, it has been previously shown that BSMT in Oryza exhibits preference for SA over BA (Koo et al. 2007; Zhao et al. 2010). Activity with SA and BA has likely been maintained since relatively early in the history of the monocot lineage because the BSMT orthologs we characterized from Musa and Phalaenopsis (Zingiberales and Asparagales, respectively) show activity with both substrates (fig. 4) as does a previously reported one from Lilium (Liliales) (Wang et al. 2015). Second, the phylogenetic analysis and characterization of a Carica SAMT sequence (fig. 4) indicates that orthologs were present in the ancestor of Brassicales and capable of SA methylation but were later lost some time after the divergence of Arabidopsis (and likely all of Brassicaceae since orthologs appear to be missing from all sequenced genera in the family). Nonetheless, the paralogous Arabidopsis BSMT appears to compensate for the loss of SAMT orthologs because, although it prefers to methylate BA (fig. 4), the enzyme demonstrates detectable in vitro and in planta activity with SA (Chen et al. 2003). Third, two members of Rosaceae, Malus and Prunus, are lacking SAMT coding sequences even though other members of Rosales (Morus, Ficus, and Humulus) have functional orthologs preferring to methylate SA (fig. 4). Instead, these two genera possess XMT-type enzymes that preferentially catalyze the formation of methyl salicylate (fig. 4). These enzymes are most closely related to those that are otherwise known for xanthine alkaloid methylation in Coffea and Citrus (McCarthy and McCarthy 2007; Huang et al. 2016). Finally, it appears that Geranium also expresses an XMT-type enzyme that shows maximal activity with SA (fig. 4). However, whether the XMT-type enzyme compensates for lack of SAMT is unclear since no genomic sequence is available for that genus or the order Geraniales in which it belongs. Our findings of functional compensation by close paralogs to maintain MeSA production reinforce conclusions from Arabidopsis knock-out mutants (Hanada et al. 2011).

Met150 Is a Reliable Predictor of Substrate Preference

Because previous studies have shown an important functional role for Met replacement of His150 of the ancestral SABATH enzyme (Node A in fig. 4), we predicted that SA substrate preference would be associated with Met150 (Barkman et al. 2007; Huang et al. 2012). Yet, to our surprise, Met was found in the active site of only 41 out of 66 of the SAMT-type enzymes showing SA methylation preference (fig. 4). The other 25 SA-preferring SAMT enzymes have Gln in the homologous position 150. Previous analyses of Huang et al. (2012) that inferred a mechanistic role of Met for the evolution of SA preference in the ancient enzyme of Node B (fig. 4) had fewer orthologs available for comparative analyses and only included Clarkia as a member of the Rosid clade, which has Met like most Asterids (fig. 4). Now, with this broad sampling of Rosid sequences, ancestral state estimates indicate that Gln, not Met, replaced His in the ancestor of core eudicots (Node B) and was retained by nearly all Rosids with subsequent parallel substitution to Met separately in Myrtales and Malpighiales (fig. 4). Even though Met150 may not be responsible for the evolution of SA preference in the progenitor of the SAMT lineage as hypothesized by Huang et al. (2012), it remains a robust predictor of substrate preference since every enzyme with Met in the active site prefers to methylate SA (fig. 4). Furthermore, a phylogenetic correlation test (Pagel 1994) revealed that a model assuming dependence of SA preference on evolving Met at active site position 150 was a significantly better fit to the data than the null model of no association (P = 0.05). Under the model of trait dependence, the rate of gain of SA preference after evolving Met in the active site was 17 times higher than the rate estimated under the model of independent evolution of active site residue and substrate preference (supplementary fig. S1, Supplementary Material online). The higher estimated rate may be due to the two independent substitutions of Met150 in the paralogous BAMT/BSMT and XMT lineages that appear coincident with the evolution of SA preference (fig. 4). Specifically, although the monocot BSMT clade of enzymes appear to have ancestrally had His in active site position 150, at least one historical shift to Met is estimated such that the descendant enzymes in Phalaenopsis, Musa, and Oryza evolved preference for SA over BA (fig. 4). Likewise, in the XMT clade, Met apparently replaced His in the common ancestor of Prunus and Malus (both Rosacaeae), as well as Geranium, and their enzymes prefer to methylate SA (fig. 4). For context, our survey of the 1,578 SABATH sequences of figure 2 shows that across land plants, 69.6% have His, 13.6% have Gln, 9.7% have Met, 3.1% have Asn, whereas 3.9% show other amino acids at active site position 150. Of the 9.7% sequences with Met, all are found in the SAMT/BSMT/XMT clades. Thus, the association of Met with these SA-preferring enzymes appears to be nonrandom. In fact, a phylogenetic ANOVA (Garland et al. 1993) found a significant difference in the level of SA preference in enzymes with Met compared with His; however, the difference of SA preference between enzymes with Met versus Gln was not significant at P < 0.05 (fig. 5 and supplementary table S2, Supplementary Material online).
Fig. 5.

Average level of SA preference for enzymes with different amino acid residues in active site position 150. Phylogenetic ANOVA results indicate that enzymes with Gln and Met are not significantly different from one another. However, enzymes with His in the active site have significantly lower SA methylation preference than those with either Gln or His (P = 0.003 in both cases).

Average level of SA preference for enzymes with different amino acid residues in active site position 150. Phylogenetic ANOVA results indicate that enzymes with Gln and Met are not significantly different from one another. However, enzymes with His in the active site have significantly lower SA methylation preference than those with either Gln or His (P = 0.003 in both cases). The phylogenetic ANOVA also showed a significant difference in SA preference for enzymes with Gln in the active site relative to those with His (fig. 5 and supplementary table S2, Supplementary Material online). However, the association of Gln at position 150 with SA preference appears to be context-dependent. There are 32 enzymes with Gln in figure 4 but seven of those show preference for other substrates. For instance, the duplicate SAMT-type sequence with Gln150 from Ipomoea prefers to methylate BA as compared with SA (fig. 4). In addition, the Medicago truncatula enzyme orthologous to SAMT preferentially methylates anthranilic acid even though it exhibits Gln in the active site (Pollier et al. 2019). In contrast, of the 12 enzymes shown in figure 4 that have His in the active site position 150, none prefer to methylate SA. Yet, the SA-preferring enzymes from the liverwort, Conocephalum (Zhang et al. 2019) and the Gymnosperm, Picea (Chaiprasongsuk et al. 2018) both possesses His at the homologous position in the active site. These results strongly imply that additional amino acid positions, perhaps near the active site, provide altered substrate preference. In order to dissect the evolutionary basis for substrate preference shifts in these SABATH enzymes, it will likely require ancestral sequence resurrection coupled with mutagenesis (Thornton 2004) and integration of structural, computational, and experimental data (Torrens-Spence et al. 2020). It is clear that the increased number of enzymes assayed here has advanced our understanding of the SABATH family in terms of shifts of substrate preference and associations with particular active site residues. Further experimental study of many more SABATH family members will likely provide finer-scale resolution of the relationship of active site configuration and substrate preference and has the potential to yield additional insights into how functional diversification among paralogous enzymes evolves. Although computational approaches may assist protein functional annotation broadly (Furnham et al. 2012), engagement of as many students as possible in the process of experimental investigations could prove to be a fruitful way to understand the evolution of enzymatic properties and empirically assay the voluminous sequences that remain uncharacterized. This study demonstrates one possible outcome of bringing hands-on scientific inquiry into the college classroom for the purpose of understanding the functional significance of sequence variation across the Tree of Life.

Materials and Methods

Implementation of Collaborative Classroom Investigations

The overall goal of each semester-long class was to provide an integrative scientific experience that started with a living plant and ended up with metabolomic data obtained from gas chromatography-mass spectrometry (GC-MS) analyses to test hypotheses about the degree to which historical amino acid substitutions in SABATH proteins might promote enzyme functional evolution. Three major evolutionary concepts were the focus for the course. First, natural selection may impose variable levels of constraint upon different parts of proteins. All target sequences were manually aligned to a data set with representatives from each lineage of the SABATH family of enzymes. From this, it was demonstrated that although proteins from different species exhibit some sites that were hard to align and may be neutral, others are highly conserved and align easily because they presumably are under strong purifying selection such that they rarely mutate. Second, gene duplication and phylogenetic divergence can generate protein diversity. Subsequent to alignments, phylogenetic analyses of candidate proteins were performed in order to assess orthology/paralogy and make predictions about enzyme activity based on relatedness to proteins with demonstrated functions. Third, protein functional changes arise from historical amino acid replacements. This was investigated by mapping enzyme relative substrate preferences from the classroom investigations and published literature onto the phylogenetic tree and assessing variation relative to active site amino acid changes that may be correlated. In order to collect novel data, basic principles of bioinformatics, primer design, RNA extraction, RT-PCR, cloning, transformation, protein expression, enzyme assays, and GC-MS analysis of metabolites were introduced and protocols both from the literature and commercially available kits were provided and are described in relevant sections below. These protocols were implemented independently for different taxa and lab notebooks documented all results, as expected for any scientific study. In any given semester, lab reports were written based upon analyzed and pooled class data that were interpreted relative to published literature. Through several semesters of collaboration, the classroom investigations revealed novel results including previously unknown paralog compensation and the correlation of Met150 with SA preference in XMT orthologs and that genus-specific SAMT duplicates may partition substrate preferences. In addition, these studies pointed to the apparent equivalence of Gln150 and Met150 for predicting SA preference of most SAMTs. The generation of the novel data reported here also provided the opportunity to participate in all steps of the peer-review publication process related to this manuscript.

Phylogenetic Methods

Trees were estimated from amino acid translations from a combination of genome and transcriptome sequences obtained from GenBank, 1KP, and Phytozome databases using the C. breweri SAMT (Acc. No. AF133053.1) as a query. Additional genome sequences were obtained through a BLAST search of the Ginkgo biloba (Guan et al. 2016), Picea abies (Nystedt et al. 2013), Azolla filiculoides, and Salvinia cucullata (Li et al. 2018) genome repositories. Sequences were aligned using the alignment software MAFFT (Katoh and Standley 2013) using default parameters. Trees and branch support estimates were calculated using IQTree (Nguyen et al. 2015) via the CIPRES Science Gateway. The JTT model of amino acid substitution and number of free rate categories were determined by IQTree’s ModelFinder algorithm (Kalyaanamoorthy et al. 2017), and branch support values were generated using IQTree’s ultrafast bootstrapping method (Hoang et al. 2018). Ancestral states were estimated using the “-asr” function in IQTree. Statistical analyses of the dependence of SA preference upon Met in active site position 150 were performed across the lineages of figure 4 according to the models of Pagel (1994) as implemented in Mesquite (Maddison and Maddison 2021). The dependent model with eight rate parameters was compared with the six parameter model in which q12 and q34 were constrained to be equal, as were those of q21 and q43 (supplementary fig. S1, Supplementary Material online). Significance of the test was assessed using 1000 simulations. In order to test for significant differences in relative substrate preference for SA according to the active site residue at position 150 (Met, Gln, or His) shown for enzymes in figure 4, we used phylogenetically explicit ANOVA (Garland et al. 1993) implemented in the R package “phytools” using the function, phylANOVA (Revell 2012). For the phylogenetic ANOVA, we performed 1,000 simulations and the relative substrate preference values were log10 transformed to reduce the effect of differences in magnitude.

Gene Isolation and Amplification

All plant tissues were obtained from the Western Michigan University Finch greenhouse or campus grounds. Roughly 100 mg of fresh tissue was flash-frozen using liquid nitrogen and subsequently ground into a fine powder using a sterile pestle. The tissue extracted for each species is listed in supplementary table S1, Supplementary Material online. RNA extractions were then performed using the RNeasy Plant Mini Kit (Qiagen Inc., Valencia, CA) or the Spectrum Plant Total RNA kit (Sigma–Aldrich, St. Louis, MO), according to manufacturer protocols. RNA was eluted using 40 µl sterile RNAse-free water and stored at −80 °C. RNA extraction quality was confirmed via electrophoresis in a 1.5% agarose gel and concentration was determined using a NanoDrop spectrophotometer (ThermoFisher Scientific, Waltham, MA). To obtain primers for RT-PCR of putative SAMT orthologs, the full amino acid sequence from C. breweri SAMT (GenBank accession number AF133053.1) was used as a query for BLAST searches (TBlastN) of the GenBank nonredundant and transcriptome shotgun assembly databases, as well as the OneKP data set from China National Gene Bank. Forward and reverse primers were subsequently designed from full-length sequences to amplify the cDNAs of the SAMT genes. If DNA sequence data were not available for the species chosen, primers were designed from sequences of congeners. All primer sequences are listed in supplementary table S1, Supplementary Material online. To obtain cDNA, the Superscript III One-Step RT-PCR System with Platinum Taq (Invitrogen, Carlsbad, CA) was used for RT-PCR. Each reaction consisted of 0.5 µl of each primer (10 µM), 12.5 µl Invitrogen 2× reaction mix, 1 µl SSIII Platinum Taq mix, 500 ng total RNA, and RNase-free water to bring the reaction to a final volume of 25 µl. RT-PCR utilized the following conditions: reverse transcription at 50 °C for 10 min, followed by 40 cycles of denaturation at 94 °C for 30 s, annealing at 50 °C for 30 s and extension at 72 °C for 70 s. The final extension was held for 25 min at 72 °C. Accession numbers for the cDNA sequences encoding each assayed enzyme are provided in supplementary table S1, Supplementary Material online.

Transformation and Protein Expression in Escherichia coli

Following amplification, cDNA was integrated into a pTrcHIS vector using the pTrcHIS TOPO TA Expression Kit or the pBAD plasmid using the pBAD TOPO TA kit (both were from Invitrogen, Carlsbad, CA). The TOPO TA cloning kits were chosen because they allow for rapid cloning in a college classroom setting. Ligation of cDNA into the pTrcHIS and pBAD vectors and subsequent transformation into Top10 E. coli cells was carried out according to the manufacturer’s protocol. About 100 μl of the transformation mixture were pipetted onto LB plates containing 50 µg/ml ampicillin and incubated overnight at 37 °C. Colonies were screened for full-length inserts by PCR, using the pTrcHIS or pBAD forward and reverse primers. Each PCR reaction contained 0.5 µl of each primer (10 µM), 12.5 µl Qiagen Taq 2× PCR master mix (Qiagen Inc., Germantown, MD), and water to bring the reaction volume up to a total of 24.5 µl. Bacteria were transferred into the reaction tube using a sterile pipette tip (estimated volume 0.5 µl). Products containing full-length inserts were purified using the QIAGEN QIAquick PCR Purification Kit (Qiagen Inc., Germantown, MD) according to the manufacturer’s protocol. Purified product was submitted to Genewiz Corp. (South Plainfield, NJ) for Sanger sequencing to confirm sequence orientation and that full-length open reading frames were obtained. Pilot bacterial cultures were prepared in 5 ml LB medium with 50 µg/ml ampicillin from colonies containing full-length, sense-orientation sequences. These pilot cultures were incubated with shaking overnight at 37 °C. The following day, 48 ml of LB-amp was inoculated using 2 ml of the pilot culture and incubated at 37 °C until reaching an optical density of 0.6 at 600 nm. Protein expression was then induced by addition of IPTG to a final concentration of 1 mM for cultures transformed with ptrcHIS or arabinose to a final concentration of 0.2% for pBAD-containing cultures.

Enzyme Assay with SA and BA

In order to determine substrate preference for heterologously expressed enzymes, we utilized the approach of Ross et al. (1999) as extended by Barkman et al. (2007). This approach assays enzyme activity with SA and BA directly from 50 ml E. coli LB cultures and was used for several reasons. First, it provides a direct measure of substrate preference since the substrates are provided in equimolar concentrations and are saturating. Indeed, our comparisons of substrate preference using the Barkman et al. (2007) method appear to be qualitatively comparable with kinetic estimates obtained from purified enzyme preparations for the cases we have examined. For example, we determined the ratio of specificity constants for SA and BA of the D. wrightii SAMT as follows: Kcat/KM(SA):Kcat/KM(BA)=∼100. Using the method of Barkman et al. (2007) with D. wrightii SAMT, we obtained the following ratio of methylated products: MeSA:MeBA=∼20. Therefore, both of these approaches indicate a clear qualitative preference for SA methylation over BA methylation. Second, several of the expressed proteins were not soluble when extracted; therefore, they could not be assayed by other methods such as the radioactive method of Huang et al. (2012) which only measures activity with single substrates. Third, given that most of the enzymes were isolated and assayed in the context of a one semester college course, complete kinetic characterization using radioisotopes was impractical. For GC-MS assays of the Barkman et al. (2007) method, equimolar concentrations of SA and BA (0.2 mM final concentration) were added to the 50 ml LB cultures with 50 µg/ml ampicillin immediately after the IPTG was added and were shaken gently overnight at room temperature. These were then centrifuged at 4°C for 10 min at 3,000×g to pellet cells. The supernatant was then extracted with 5 ml hexane to remove methylated products. The contents of hexane extracts were analyzed using GC-MS. The GC-MS analyses were performed on an HP6890 GC System coupled to an HP5973 Mass Selective Detector using a DB-5 capillary column. The oven conditions were 40 °C for 2 min, ramping 20 °C/min to 300 °C with a 2 min hold. Relative activity was determined by peak integration and comparing peak areas for each methylated product. Because we tested hypotheses related to relative substrate preference, using the same protein concentration in each assay was not required; in this case, providing competing substrates in equimolar concentrations is the most critical aspect. Replicate transformations were performed and mean and standard deviation of the ratio of products were determined. Importantly, 50 ml cultures in which an antisense vector was used for transformations served as negative controls.

Supplementary Material

Supplementary data are available at Molecular Biology and Evolution online. Click here for additional data file.
  55 in total

1.  Evolution of homospermidine synthase in the convolvulaceae: a story of gene duplication, gene loss, and periods of various selection pressures.

Authors:  Elisabeth Kaltenegger; Eckart Eich; Dietrich Ober
Journal:  Plant Cell       Date:  2013-04-09       Impact factor: 11.277

2.  Convergent evolution of caffeine in plants by co-option of exapted ancestral enzymes.

Authors:  Ruiqi Huang; Andrew J O'Donnell; Jessica J Barboline; Todd J Barkman
Journal:  Proc Natl Acad Sci U S A       Date:  2016-09-20       Impact factor: 11.205

3.  Positive selection for single amino acid change promotes substrate discrimination of a plant volatile-producing enzyme.

Authors:  Todd J Barkman; Talline R Martins; Elizabeth Sutton; John T Stout
Journal:  Mol Biol Evol       Date:  2007-03-20       Impact factor: 16.240

4.  Herbivore-induced SABATH methyltransferases of maize that methylate anthranilic acid using s-adenosyl-L-methionine.

Authors:  Tobias G Köllner; Claudia Lenk; Nan Zhao; Irmgard Seidl-Adams; Jonathan Gershenzon; Feng Chen; Jörg Degenhardt
Journal:  Plant Physiol       Date:  2010-06-02       Impact factor: 8.340

5.  Enzymatic, expression and structural divergences among carboxyl O-methyltransferases after gene duplication and speciation in Nicotiana.

Authors:  Frank Hippauf; Elke Michalsky; Ruiqi Huang; Robert Preissner; Todd J Barkman; Birgit Piechulla
Journal:  Plant Mol Biol       Date:  2009-11-21       Impact factor: 4.076

6.  Molecular cloning and functional characterization of three distinct N-methyltransferases involved in the caffeine biosynthetic pathway in coffee plants.

Authors:  Hirotaka Uefuji; Shinjiro Ogita; Yube Yamaguchi; Nozomu Koizumi; Hiroshi Sano
Journal:  Plant Physiol       Date:  2003-05       Impact factor: 8.340

7.  An Arabidopsis thaliana gene for methylsalicylate biosynthesis, identified by a biochemical genomics approach, has a role in defense.

Authors:  Feng Chen; John C D'Auria; Dorothea Tholl; Jeannine R Ross; Jonathan Gershenzon; Joseph P Noel; Eran Pichersky
Journal:  Plant J       Date:  2003-12       Impact factor: 6.417

8.  The leaf epidermome of Catharanthus roseus reveals its biochemical specialization.

Authors:  Jun Murata; Jonathon Roepke; Heather Gordon; Vincenzo De Luca
Journal:  Plant Cell       Date:  2008-03-07       Impact factor: 11.277

9.  Structural basis for divergent and convergent evolution of catalytic machineries in plant aromatic amino acid decarboxylase proteins.

Authors:  Michael P Torrens-Spence; Ying-Chih Chiang; Tyler Smith; Maria A Vicent; Yi Wang; Jing-Ke Weng
Journal:  Proc Natl Acad Sci U S A       Date:  2020-05-05       Impact factor: 11.205

10.  Draft genome of the living fossil Ginkgo biloba.

Authors:  Rui Guan; Yunpeng Zhao; He Zhang; Guangyi Fan; Xin Liu; Wenbin Zhou; Chengcheng Shi; Jiahao Wang; Weiqing Liu; Xinming Liang; Yuanyuan Fu; Kailong Ma; Lijun Zhao; Fumin Zhang; Zuhong Lu; Simon Ming-Yuen Lee; Xun Xu; Jian Wang; Huanming Yang; Chengxin Fu; Song Ge; Wenbin Chen
Journal:  Gigascience       Date:  2016-11-21       Impact factor: 6.524

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.