Literature DB >> 33918244

Genome-Wide Analysis of Terpene Synthase Gene Family in Mentha longifolia and Catalytic Activity Analysis of a Single Terpene Synthase.

Zequn Chen1, Kelly J. Vining2, Xiwu Qi1, Xu Yu1, Ying Zheng3, Zhiqi Liu4, Hailing Fang1,5, Li Li1, Yang Bai1, Chengyuan Liang1,5, Weilin Li6, Bernd Markus Lange7.   

Abstract

Terpenoids are a wide variety of natural products and terpene synthase (TPS) plays a key role in the biosynthesis of terpenoids. Mentha plants are rich in essential oils, whose main components are terpenoids, and their biosynthetic pathways have been basically elucidated. However, there is a lack of systematic identification and study of TPS in Mentha plants. In this work, we genome-widely identified and analyzed the TPS gene family in Mentha longifolia, a model plant for functional genomic research in the genus Mentha. A total of 63 TPS genes were identified in the M. longifolia genome sequence assembly, which could be divided into six subfamilies. The TPS-b subfamily had the largest number of genes, which might be related to the abundant monoterpenoids in Mentha plants. The TPS-e subfamily had 18 members and showed a significant species-specific expansion compared with other sequenced Lamiaceae plant species. The 63 TPS genes could be mapped to nine scaffolds of the M. longifolia genome sequence assembly and the distribution of these genes is uneven. Tandem duplicates and fragment duplicates contributed greatly to the increase in the number of TPS genes in M. longifolia. The conserved motifs (RR(X)8W, NSE/DTE, RXR, and DDXXD) were analyzed in M. longifolia TPSs, and significant differentiation was found between different subfamilies. Adaptive evolution analysis showed that M. longifolia TPSs were subjected to purifying selection after the species-specific expansion, and some amino acid residues under positive selection were identified. Furthermore, we also cloned and analyzed the catalytic activity of a single terpene synthase, MlongTPS29, which belongs to the TPS-b subfamily. MlongTPS29 could encode a limonene synthase and catalyze the biosynthesis of limonene, an important precursor of essential oils from the genus Mentha. This study provides useful information for the biosynthesis of terpenoids in the genus Mentha.

Entities:  

Keywords:  Mentha longifolia; limonene synthase; terpene synthase; terpenoids

Mesh:

Substances:

Year:  2021        PMID: 33918244      PMCID: PMC8066702          DOI: 10.3390/genes12040518

Source DB:  PubMed          Journal:  Genes (Basel)        ISSN: 2073-4425            Impact factor:   4.096


1. Introduction

Terpenoids are the largest and a structurally diverse group of natural products in plants [1]. To date, more than 80,000 terpenoid compounds, including monoterpenes, sesquiterpenes, and diterpenes, have been identified [2,3]. Terpenoids play important roles in both primary and secondary metabolism of plants. For example, gibberellin, brassinosteroid, and carotenoid are well characterized terpenoids, which play important roles in plant growth and development as plant hormones and photosynthetic pigments [4]. Compared to the small amount of terpenoids involved in primary metabolism, the majority of terpenoids are classified as secondary metabolites. Although they are not involved in the basic growth and development of plants, they still have some physiological functions and a wide range of applications, including plant defense response, pharmacological compounds, and fragrance and aroma constituents [5,6,7]. Although the number of terpenoids is huge, they are all derived biosynthetically from common precursors, dimethylallyl diphosphate (DMAPP) and isopentenyl diphosphate (IPP) [8]. These precursors are produced by two biosynthetic pathways, the methylerythritol phosphate pathway (MEP) in the chloroplast and the mevalonate pathway (MVA) in the cytosol [9]. The condensation reaction of DMAPP and IPP catalyzed by prenyltransferases produces the direct precursors geranyl diphosphate (GPP C10), farnesyl diphosphate (FPP C15), and geranylgeranyl pyrophosphate (GGPP C20). Subsequently, terpene synthases (TPSs) catalyze the precursors to form a variety of terpenoids, including hemiterpene (C5), monoterpene (C10), sesquiterpene (C15), and diterpene (C20) [10,11]. The products of TPS can be further modified by other enzymatic reaction, such as dehydrogenation, isomerization, and group transfer. In the biosynthetic pathway of terpenoids, TPSs is positioned at the branch point and is a key enzyme for terpenoid biosynthesis. Each full-length TPS is characterized by two conserved domains with Pfam ID PF01397 (N-terminal) and PF03936 (C-terminal) [1]. The N-terminal domain has a conserved RRX8W motif, and the C-terminal domain has a conserved DDXXD motif and NSE/DTE motif [12]. TPSs constitute a mid-size gene family, the number of which varies greatly in different plants [12]. To date, TPS gene families have been genome-widely identified in various plant species, ranging from spermatophytes to mosses [13]. According to the phylogenetic analysis, the plant TPS family can be classified into seven subfamilies (TPS-a, TPS-b, TPS-c, TPS-d, TPS-e/f, TPS-g, and TPS-h) [12,13]. Different subfamily genes also encode terpene synthase with different functions, for example, TPS-a subfamily genes encode sesquiterpene synthases, while TPS-b and TPS-g subfamily genes encode monoterpene synthases [14]. TPS-d is a gymnosperm-specific subfamily, which performs several functions, such as diterpene, monoterpene, and sesquiterpene synthases [15]. The TPS genes could also been classified into different classes according to their genomic structure, including class I (13-15 exons), class II (10 exons), and class III (7 exons) [16]. The genus Mentha encompasses mint species cultivated for their essential oils, which are widely used in the flavor, fragrance, and aromatherapy industries [17]. The major constituents of mint essential oils are monoterpenes, including (−)-menthol, (+)-neomenthol, (+)-isomenthol, (+)-carvone, and (+)-menthofuran [18,19]. The biosynthetic pathway of the most abundant oil constituents has been well illustrated in peppermint (Mentha × piperita L.) and spearmint (Mentha spicata L.) [20,21]. Limited by the complex polyploidy, the genome research of peppermint and spearmint has been progressing slowly. The horse mint (Mentha longifolia) is an ancestor species of the genus Mentha, which has been developed as a model species for mint genomics because of its diploid genome structure, relatively small genome, and other genetics features [22]. The genome sequencing of M. longifolia has been completed and updated to a pseudochromosome level of quality, which provides good opportunities for genome-wide analysis of terpenoid biosynthesis in the genus Mentha [23]. Considering the importance of terpenoid compounds in M. longifolia and the limited knowledge of their biosynthesis, genome-wide identification of TPS genes was conducted in this study. Then, sequence features, gene family classification, genome localization, and phylogenetic analyses were performed to characterize the TPS family. Furthermore, a candidate TPS gene encoding a limonene synthase was cloned, and the catalytic activity was also assayed.

2. Materials and Methods

2.1. Data Retrieval and Identification of TPSs

The proteome data of the sequenced Labiatae plants were downloaded from http://www.ndctcm.org/shujukujieshao/2015-04-23/27.html (Salvia miltiorrhiza) [24], http://caps.ncbs.res.in/Ote/ (Ocimum tenuiflorum) [25], http://ocri-genomics.org/Sinbase/ (Sesamum indicum) [26], and http://gigadb.org/dataset/100463 (Salvia splendens) [27] (Accessed data: 21 July 2020). For the identification of TPSs, the TPS specific Pfam N-terminal domain model (PF01397) and C-terminal domain model (PF03936) were downloaded from the Pfam website (http://pfam.xfam.org/) [28]. Then, an HMM search (v3.1b2) [29] was conducted to search the proteome using the PF01397 and PF03936 domain model data as queries. Candidate genes with both N-terminal and C-terminal domains were considered as complete TPSs and used for further analysis. The Arabidopsis TPS sequences were downloaded from TAIR (https://www.arabidopsis.org/) (Accessed data: 21 July 2020). The genome data of M. longifolia were downloaded from Mint Genomics Resource (http://langelabtools.wsu.edu/mgr/) (Accessed data: 5 May 2020). The assembly of the M. longifolia genome contains 12 large scaffolds encompassing 462.6 Mb, which is consistent with the previously reported genome size (400~500 Mb) [22]. The new assembly corresponds to at least 92.5% of the predicted genome size. Due to the lack of gene prediction of the M. longifolia genome sequence assembly, a BLAT-based method was used to identify TPSs in M. longifolia genome sequence assembly [30]. The protein query set representing the TPS family used for BLAT was constructed based on the PF01397 and PF03936 seed sequences. The target sequences and flanking sequences in the M. longifolia genome sequence were extracted and then imported to Genscan for gene prediction [31]. The conserved N-terminal and C-terminal domains of M. longifolia TPSs were confirmed on the SMART website (http://smart.embl-heidelberg.de/).

2.2. Multiple Sequence Alignment and Phylogenetic Analyses

The multiple sequence alignment of TPSs from M. longifolia and other plants was performed using the MUSCLE3.6 software [32]. The alignment results were imported to MGEA X to construct the phylogenetic tree [33]. The phylogenetic tree was constructed using the maximum likelihood method with the Jones Taylor Thornton (JTT) model. The bootstrap value for the phylogenetic tree was 1000 replicates. The phylogenetic tree was further modified using iTOL (https://itol.embl.de/) [34].

2.3. Characterization of TPSs from M. longifolia

The gene structure of TPSs from M. longifolia was determined based on annotation information and then illustrated using Exon-Intron Graphic Maker (http://www.wormweb.org/exonintron). Subcellular localization of M. longifolia TPSs was predicted using the AtSubP tool (http://bioinfo3.noble.org/AtSubP/index.php) and ProtComp (http://linux1.softberry.com/berry.phtml?topic=protcomppl&group=programs&subgroup=proloc). The location of M. longifolia TPS genes on the scaffold was determined by Tbtools [35]. Tandemly duplicated genes were identified by their sequence similarity and scaffold localization according to earlier studies [36,37]. The conserved motifs of M. longifolia TPSs, including the RR(X)8W motif, NSE/DTE motif, RXR motif, and DDXXD motif, were identified based on the multiple sequence alignment results.

2.4. Adaptive Evolution Analysis of M. longifolia TPSs

Based on the phylogenetic tree and duplication gene analysis of the M. longifolia TPS gene family, 14 paralog pairs were selected to calculate the nonsynonymous-to-synonymous substitution ratio (Ka/Ks). The calculation was conducted using a KaKs-Calculator 2.0 [38] with the sliding window method (90 bp window and 30 bp slide). Then, the site model of EasyCodeML [39] was used to conduct adaptive evolution analyses on each subfamily of M. longifolia TPSs. Three pairs of models (M0 (one-ratio) vs. M3 (discrete), M1a (neutral) vs. M2a (positive selection), and M7 (β) vs. M8 (β + ω)) were chosen to test positive selection using the likelihood ratio test (LRT) and the Bayes empirical Bayes (BEB) method [40,41].

2.5. RNA Isolation and MlongTPS29 Cloning

The M. longifolia used to extract RNA was introduced from the Botanical Garden Berlin-Dahlem in Germany with the accession number of ES-0-B-0180887 and then cultivated at the Germplasm Nursery in the Institute of Botany, Jiangsu Province and Chinese Academy of Sciences, Nanjing, Jiangsu Province. Total RNA of M. longifolia leaves was extracted using a FastPure Plant Total RNA Isolation Kit (Vazyme, Nanjing, China) according to the manufacturer’s instructions. After quality and concentration detection, 1 μg of total RNA was used to synthesize the first strand cDNA with a HiScript II 1st Strand cDNA Synthesis Kit (Vazyme, Nanjing, China). To identify the candidate limonene synthase in M. longifolia genome sequence, limonene synthases of M. spicata (AAC37366.1) and M. piperita (ABW86881.1) were used as queries to BLAST in M. longifolia TPSs. Polymerase chain reaction (PCR) was performed to amplify MlongTPS29 with a gene-specific forward primer (5′-ATGGCTTTCAAAGTGTTTAGTG-3′) and reverse primer (5′-TCATGCAAAGGGCTCGAAT-3′). The amplified fragments were purified using the TaKaRa MiniBEST Agarose Gel DNA Extraction Kit Ver.4.0 (Takara, Dalian, China) and then cloned into the pClone007 Blunt Simple Vector (Tsingke, Beijing, China). The positive clones were screened and sequenced for confirmation.

2.6. Expression of Recombinant MlongTPS29 in Escherichia coli and Enzyme Assays

The coding sequence of MlongTPS29 was cloned into the prokaryotic expression vector pET28a using the homologous recombination method. Briefly, MlongTPS29 was amplified with primers containing homology arms. The forward primer was 5′-CAAATGGGTCGCGGATCCATGGCTTTCAAAGTGTTTAGTG-3′, and the reverse primer was 5′-GGCCGCAAGCTTGTCGACTCATGCAAAGGGCTCGAAT-3′ (Italic indicates homology arms). The pET28a vector was digested with the restriction endonuclease BamHI and SalI. Then, the homologous recombination was performed with a Trelief™ SoSoo Cloning Kit Ver.2 (Tsingke, Beijing, China) according to the manufacturer’s instructions. The recombinant vector was transformed into E. coli BL21 (DE3), and the expression of recombinant MlongTPS29 was induced by addition of isopropyl-β-D-thiogalactoside (IPTG) to a final concentration of 1 mM. After cultured at 16 °C for 20 h, the cells were collected by centrifugation and washed twice using reaction buffer (50 mM HEPES, pH 7.5, with 5 mM MgCl2, 2 mM MnCl2, 200 mM KCl, 5 mM dithiothreitol, and 10% (v/v) glycerol). Then, the cells were resuspended in reaction buffer and disrupted by sonication. After centrifugation at 16,000× g at 4 °C for 15 min, the supernatant was collected and used for further enzyme assays. The enzyme activity of MlongTPS29 was detected according to an earlier report with minor modification [42]. Briefly, the supernatant of E. coli with recombinant MlongTPS29 was added to a 200 μL reaction mixture, and then 10 μM of GPP was added to initiate the reaction. The reaction mixture was incubated at 30 °C for 1 h. Products of the reaction were extracted with dichloromethane and then detected by an Agilent 8860/5977B GC-MS system equipped with a DB-5MS column (30 m × 0.25 mm i.d.). The oven temperature was isothermal at 45 °C, then increased at a rate of 10 °C/min to 220 °C, and maintained at 220 °C for 2 min.

3. Results

3.1. Identification of TPS Genes in M. longifolia Genome Sequence

The HMM-based method and BLAST-based method are commonly used to identify the TPS gene family in plants. In this study, due to the lack of gene prediction of the M. longifolia genome, a BLAT-based method was used to identify TPS family. Using the conserved TPS N-terminal domain (PF01397) and C-terminal domain (PF03936) seed sequences as queries, 89 and 99 TPS-N and TPS-C genes were identified after gene model prediction, respectively. By comparing the two results, 78 candidate TPS genes were obtained. After confirming the conserved domains manually, we finally identified 63 TPSs containing both TPS N-terminal and TPS C-terminal domains in the M. longifolia genome sequence (Table 1, File S1).
Table 1

Statistics of TPS gene information of Mentha longifolia.

Gene IDScaffoldStartEndStrandGene Length (bp)CDS (bp)Amino AcidExon NumberpIMw (kDa)Localization
MlongTPS1scaffold325207839252110713233163554475.0862.93 Chloroplast a/Cytoplasm b
MlongTPS2scaffold541734433417373822950148849565.2857.36 Chloroplast a/Cytoplasm b
MlongTPS3scaffold541781767417842352469163854564.9963.01 Chloroplast a/Cytoplasm b
MlongTPS4scaffold24260023642604433+4198162654175.6363.19 Chloroplast a/Cytoplasm b
MlongTPS5scaffold24264691442652153+5240162654175.5663.06 Chloroplast a/Cytoplasm b
MlongTPS6scaffold24280887642813607+4732164154675.7063.65 Chloroplast a/Cytoplasmb
MlongTPS7scaffold1025190382521515+2478157252385.0160.86 Chloroplast a/Cytoplasmb
MlongTPS8scaffold1028695152871994+2480167455775.1165.04 Chloroplast a/Cytoplasm b
MlongTPS9scaffold1032458873248093+2207131143685.8251.00 Chloroplast a,b
MlongTPS10scaffold102410186224105239+3378134144675.9452.67 Chloroplast a/Cytoplasm b
MlongTPS11scaffold1026605063266068571795115538466.9744.60 Chloroplast a/Cytoplasm b
MlongTPS12scaffold8261918726220342848148249365.4457.37 Chloroplast a/Cytoplasm b
MlongTPS13scaffold8262999126331163126156352075.5960.59 Chloroplast a/Cytoplasm b
MlongTPS14scaffold112209476622101682+69172589862135.30100.10 Chloroplast a,b
MlongTPS15scaffold112213256222135423+2862179159675.2669.43 Chloroplast a,b
MlongTPS16scaffold112235316422356569+3406178259375.7368.84 Chloroplast a,b
MlongTPS17scaffold11223765412238119246521560519105.7860.82 Chloroplast a,b
MlongTPS18scaffold1122424761224301575397144948275.4656.62 Chloroplast a,b
MlongTPS19scaffold112980706229810465+3404178259375.6568.76 Chloroplast a,b
MlongTPS20scaffold112981696629822114+5149136245367.1252.57 Chloroplast a,b
MlongTPS21scaffold1129845320298499844665132043985.7951.21 Chloroplast a,b
MlongTPS22scaffold1129920867299255334667147649175.6157.69 Chloroplast a,b
MlongTPS23scaffold43473861934741984+3366137445775.7453.33 Chloroplast a,b
MlongTPS24scaffold434742308347448382531180059975.4169.98 Chloroplast a,b
MlongTPS25scaffold5285351288259+2909173457775.1867.16 Chloroplast a,b
MlongTPS26scaffold5291563294867+3305173757875.4667.19 Chloroplast a,b
MlongTPS27scaffold52960992983892291138346055.7853.55 Chloroplast a,b
MlongTPS28scaffold511506827115095852759180059975.3269.92 Chloroplast a,b
MlongTPS29scaffold511621067116238172751180059975.4369.91 Chloroplast a,b
MlongTPS30scaffold521893670218985454876177959276.2369.34 Chloroplast a,b
MlongTPS31scaffold21932528119331000+5720173757875.3667.30 Chloroplast a,b
MlongTPS32scaffold103074971530752287+2573165355075.5563.29 Chloroplast a,b
MlongTPS33scaffold1030761480307656524173159953285.5562.05 Chloroplast a,b
MlongTPS34scaffold1030776115307790122898137445766.0753.11 Chloroplast a/Cytoplasm b
MlongTPS35scaffold1030785670307882962627159052976.7761.55 Chloroplast a,b
MlongTPS36scaffold43776109037769581+84922430809156.7692.10 Chloroplast a,b
MlongTPS37scaffold94343490434871052212409802145.9591.97 Chloroplast a,b
MlongTPS38scaffold94410562441512745662178725157.8482.44 Chloroplast a,b
MlongTPS39scaffold94626769463123744692304767145.8487.25 Chloroplast a,b
MlongTPS40scaffold8145982981460505867612346781146.1989.79 Chloroplast a,b
MlongTPS41scaffold94215819422054047222085694135.6580.41 Chloroplast a,b
MlongTPS42scaffold94297285430112838441737578116.1067.05 Chloroplast a,b
MlongTPS43scaffold94315863432158857261755584115.4867.38 Chloroplast a, b
MlongTPS44scaffold94400967440483238661827608145.9070.06 Chloroplast a,b
MlongTPS45scaffold946637024668738+50371752583145.4366.94 Chloroplast a,b
MlongTPS46scaffold946962754699991+37171689562105.5865.28 Chloroplast a,b
MlongTPS47scaffold94746792475267358822295764145.8887.58 Chloroplast a,b
MlongTPS48scaffold9479136747937192353113437765.3143.28 Mitochondrion a/Chloroplast b
MlongTPS49scaffold94890741489435336131734577105.6966.69 Chloroplast a,b
MlongTPS50scaffold949407214944084+3364153651195.3059.27 Mitochondrion a/Chloroplast b
MlongTPS51scaffold949882994993896+55982292763145.7787.38 Chloroplast a,b
MlongTPS52scaffold951119725115082+3111151550495.3858.34 Mitochondrion a/Chloroplast b
MlongTPS53scaffold971321807139762+75831755584115.3867.56 Chloroplast a,b
MlongTPS54scaffold931439884314433093426135044985.0352.24 Chloroplast a,b
MlongTPS55scaffold931907037319112014165153351095.0959.61 Chloroplast a,b
MlongTPS56scaffold931917248319198752628157852595.5360.86 Chloroplast a,b
MlongTPS57scaffold82453217245797747612322773145.6288.21 Chloroplast a,b
MlongTPS58scaffold8246981224717511940130843575.2250.43 Chloroplast a,b
MlongTPS59scaffold10300781363008362554902478825125.9994.00 Chloroplast a/Cytoplasm b
MlongTPS60scaffold1131299773133005+3029152150665.9757.84 Unknown a/Cytoplasm b
MlongTPS61scaffold34474298844745414+2427157252377.0461.62 Unknown a/Cytoplasm b
MlongTPS62scaffold622720542274523+2470172857575.8266.44 Unknown a/Cytoplasm b
MlongTPS63scaffold615636480156395923113176458775.3166.38 Unknown a/Cytoplasm b

a Predicted results of AtSubP tool. The prediction approach followed the best hybrid-based classifier (AA + PSSM + N-Center-C + PSI-BLAST).b Predicted results of ProtComp.

3.2. Phylogenetic Analyses of TPSs from M. longifolia and Other Lamiaceae Plants

To examine the evolutionary relationships of M. longifolia TPSs, a phylogenetic tree was constructed using the M. longifolia TPSs and TPSs from Arabidopsis thaliana and the other four sequenced Lamiaceae plants, namely, O. teruiflorum, S. indicum, S. miltiorrhiza, and S. splendens. The phylogenetic tree demonstrated that TPS proteins were clustered into six subfamilies, including TPS-a, TPS-b, TPS-c, TPS-e, TPS-f, and TPS-g (Figure 1). No TPS-d or TPS-h gene was identified because TPS-d was gymnosperm specific, and TPS-h was only observed in Selaginella moellendorffii [12]. Some species-specific clades were observed, for example, 22 TPS-a subfamily genes of A. thaliana clustered into a clade and 11 TPS-b subfamily genes of S. splendens clustered into a clade. Among the Lamiaceae plants analyzed in this study, the TPS-a subfamily had the largest number of genes except for M. longifolia, the gene number of TPS-b subfamily of which was more than that of the TPS-a subfamily (Table 2). Comparing the gene numbers of each subfamily, it is worth noting that the gene number of the TPS-e subfamily in M. longifolia genome sequence assembly was much higher than that of the other Lamiaceae plants, and there was a significant species-specific expansion for the TPS-e subfamily in M. longifolia (Table 2).
Figure 1

Phylogenetic analysis of TPSs in M. longifolia, Arabidopsis thaliana and other Lamiaceae plants. Species: M. longifolia (Mlong), Ocimum teruiflorum (Ote), Sesamu indicum (Sin), Salvia miltiorrhiza (Smi), Salvia splendens (Ssp), A. thaliana (Ath).

Table 2

Statistics of TPS subfamily gene numbers in M. longifolia, A. thaliana and other Lamiaceae plants.

SpeciesSubfamilyTotal
abcefg
M. longifolia 13225181463
O. teruiflorum 1412721743
S. indicum 215630742
S. miltiorrhiza 3221521364
S. splendens 52307726104
A. thaliana 226111132

3.3. Classification of M. longifolia TPSs Based on the Phylogenetic Tree

The phylogenetic analysis of 63 M. longifolia TPSs was performed using MEGA X with the maximum likelihood method. Based on the phylogenetic tree, 63 M. longifolia TPSs could be divided into 6 subfamilies, namely, 13 TPS-a genes, 22 TPS-b genes, 5 TPS-c genes, 18 TPS-e genes, 1 TPS-f gene, and 4 TPS-g genes. The TPS-e and TPS-f subfamilies were always merged into one subfamily since TPS-f is derived from TPS-e, and they were clustered into one clade (Figure 2). It is worth noting that there are 18 TPS-e subfamily genes in M. longifolia genome sequence, which is much more than that reported for most other plants [13].
Figure 2

Phylogenetic analysis, subfamily classification, gene structure and conserved domains in M. longifolia TPSs. The black rectangles represent exons and the lines represent introns. The coding sequences of the conserved N-terminal domain, C-terminal domains, RR(X)8W motif, NSE/DTE motif, RXR motif, and DDXXD motif are represented in green, orange, purple, blue, gray, and red, respectively.

3.4. Exon-Intron Stucture of M. longifolia TPS Genes

The numbers of exons and introns in plant TPS genes are relatively low. According to the intron-exon pattern, TPS genes can be divided into three classes, class I, class II, and class III, which contain 12-14 introns, 9 introns, and 6 introns, respectively [16]. In this study, most TPS-a, TPS-b and TPS-g subfamily genes of M. longifolia contain six to eight exons and five to seven introns (Table 1 and Figure 2), and they all belonged to class III TPSs. The TPS-c subfamily genes contain 14 to 15 exons and 13 to 14 introns (Table 1 and Figure 2), which belonged to class I TPSs. The gene structure of the TPS-e subfamily genes showed a relatively large variation. The exon numbers of TPS-e subfamily genes varied from 6 to 14, and part of which exhibited a loss of exons in the 5′-terminal (Table 1 and Figure 2).

3.5. Genomic Distribution of M. longifolia TPS Genes

The 63 TPS genes were mapped to nine scaffolds of M. longifolia genome sequence assembly based on their localization information (Figure 3). The distribution of these genes is uneven, for example, only two TPS genes mapped onto scaffold3 and scaffold6, while 19 TPS genes clustered on scaffold9. The clustered distribution of some subfamily members was also observed, such as nine TPS-b genes clustering on scaffold11 and 16 TPS-e genes clustering on scaffold9. Tandem duplication and segment duplication are common phenomena related to the increase in gene copies in plants. In this study, tandem duplication and segment duplication of TPS genes were also analyzed. Seven tandem duplicates and 3 segment duplicates of TPS genes were observed in the M. longifolia genome sequence assembly, and it contained a total of 30 TPS genes (Figure 3). The duplication events occurred in the TPS-a, TPS-b, and TPS-e subfamilies.
Figure 3

Scaffold localization of TPS genes in M. longifolia genome sequence assembly. The M. longifolia genome sequence assembly contains 12 large scaffolds encompassing 462.6 Mb, and the 63 TPS genes are mapped to nine scaffolds based on their localization information. The Y-axis represents the length of the scaffolds. TPS genes of TPS-a, TPS-b, TPS-c, TPS-e, TPS-f, and TPS-g subfamilies are indicated in blue, orange, purple, green, red, and gray fonts, respectively. The tandem duplication and segment duplication TPS genes are indicated in red lines.

3.6. Conserved Motif Analyses of M. longifolia TPSs

TPS harbors conserved structural features such as the RR(X)8W motif in the N-terminal domain and DDXXD and NSE/DTE motifs in the C-terminal domain, which play important roles in the catalytic function of TPS [12,43]. In our study, conserved motifs were analyzed in M. longifolia TPSs, and significant differentiation was found between different subfamilies (Figure 4). The RR(X)8W motif is conserved in the TPS-b subfamily and plays a role in initiation of the isomerization cyclization reaction [44]. Both the TPS-b and TPS-g subfamilies are angiosperm monoterpene synthases, but the TPS-g proteins do not contain this motif. The TPS-g proteins are required for the biosynthesis of acyclic monoterpenes, which form floral volatile organic compounds (VOCs) [45]. It has been reported that the TPS-a subfamily encodes only sesquiterpene synthase, and the second arginine of the RR(X)8W motif is not conserved [46]. The NSE/DTE motif is conserved in most subfamilies except for the TPS-c subfamily. The RXR motif is conserved in the TPS-a and TPS-b subfamilies. The DDXXD motif is the most conserved motif among these TPSs and is conserved in the TPS-a, TPS-b, TPS-e, TPS-f, TPS-g subfamilies but not the TPS-c subfamily (Figure 4). The DDXXD motif is involved in the coordination of divalent ions and water molecules and the stabilization of the active site [47,48]. The TPS-c proteins are not expected to have this domain as they do not cleave the prenyl diphosphate unit; however, they contain a DXDD motif that is critical for the protonation initiate reaction [49].
Figure 4

The conserved RR(X)8W, NSE/DTE, RXR, and DDXXD motifs in M. longifolia TPSs.

3.7. Adaptive Evolution Analysis of M. longifolia TPSs

In order to explore whether positive selection drove the evolution of the M. longifolia TPS gene family, the nonsynonymous-to-synonymous substitution ratio (Ka/Ks = ω) was calculated to estimate the positive selection. Using the sliding window of 90 bp and a moving step of 30 bp, the Ka/Ks ratios of 14 M. longifolia TPS paralog pairs were calculated (Figure 5). A few sites in eight paralog pairs (three, three, and two for the TPS-a, TPS-b, and TPS-e subfamilies, respectively) had Ka/Ks > 1, and most sites had Ka/Ks < 1, suggesting that most M. longifolia TPS genes were subjected to purifying selection after the species-specific expansions. To further investigate the evolutionary selection pressures acting on M. longifolia TPS genes, the site models of each subfamily were calculated using EasyCodeML. As shown in Table 3, all the subfamilies were subject to purification selection with ω ranging from 0.202 to 0.310. Some amino acid residues under positive selection were identified in the TPS-c and TPS-g subfamilies.
Figure 5

Sliding-window adaptive evolution analysis of the M. longifolia TPS paralog genes. (A–C) represent paralog genes of TPS-a, TPS-b, and TPS-e subfamilies, respectively.

Table 3

Tests for selection among codons of M. longifolia TPSs using site models.

TPS SubFamilyModelnpLn LEstimates of ParametersModel ComparedLRTp-ValuePositive Sites
TPS-aM329−6662.29 p:0.300 0.605 0.095 M0 vs. M30.000 []
ω:0.047 0.287 0.782
M025−6742.49 ω0:0.225 Not Allowed
M2a28−6701.40 p:0.819 0.044 0.138 M1a vs. M2a1.000 []
ω:0.191 1.000 1.000
M1a26−6701.40 p:0.819 0.181 Not Allowed
ω:0.191 1.000
M828−6664.45 p0 = 0.989p = 0.948q = 2.701 M7 vs. M80.631 212 C 0.781
p1 = 0.011ω = 1.525
M726−6664.91 p=0.912 q=2.472 Not Allowed
TPS-bM347−2367.77 p:0.109 0.602 0.289 M0 vs. M30.000 []
ω:0.000 0.228 0.612
M043−2393.98 ω0:0.289 Not Allowed
M2a46−2382.37 p:0.756 0.123 0.121 M1a vs. M2a1.000 []
ω:0.230 1.000 1.000
M1a44−2382.37 p:0.756 0.244 Not Allowed
ω:0.230 1.000
M846−2374.65 p0 = 1.000p = 1.135q = 2.498 M7 vs. M81.000
p1 = 0.000ω = 1.000
M744−2374.65 p=1.135 q=2.498 Not Allowed
TPS-cM313−9115.18 p:0.548 0.420 0.032 M0 vs. M30.000 []
ω:0.070 0.407 8.173
M09−9231.50 ω0:0.202 Not Allowed
M2a12−9133.53 p:0.779 0.166 0.055 M1a vs. M2a1.000 []
ω:0.129 1.000 1.000
M1a10−9133.53 p:0.779 0.221 Not Allowed
ω:0.129 1.000
M812−9115.20 p0 = 0.968p = 0.772q = 2.595 M7 vs. M80.000 8 F 0.567,16 A 0.551,19 L 0.515,28 Y 0.916,32 I 0.748,33 K 0.649,41 E 0.627,212 L 0.711,591 L 0.828,636 E 0.875,637 Q 0.838,639 M 0.851,640 A 0.712,641 A 0.611,643 V 0.944,647 D 0.627,654 K 0.738
p1 = 0.032ω = 8.049
M710−9124.83 p=0.673 q=1.922 Not Allowed
TPS-eM339−6467.88 p:0.300 0.539 0.160 M0 vs. M30.000 []
ω:0.077 0.351 0.785
M035−6537.92 ω0:0.310 Not Allowed
M2a38−6492.46 p:0.739 0.167 0.095 M1a vs. M2a1.000 []
ω:0.231 1.000 1.000
M1a36−6492.46 p:0.739 0.261 Not Allowed
ω:0.231 1.000
M838−6468.70 p0 = 0.966p = 1.035q = 2.155 M7 vs. M80.858 45 R 0.514,234 V 0.633
p1 = 0.034ω = 1.000
M736−6468.86 p=0.962 q=1.829 Not Allowed
TPS-gM311−5784.14 p:0.284 0.560 0.156 M0 vs. M30.000 []
ω:0.046 0.296 24.257
M07−5866.96 ω0:0.202 Not Allowed
M2a10−5795.20 p:0.652 0.232 0.117 M1a vs. M2a1.000 []
ω:0.134 1.000 1.000
M1a8−5795.20 p:0.652 0.348 Not Allowed
ω:0.134 1.000
M810−5784.63 p0 = 0.869p = 0.935q = 2.849 M7 vs. M80.008 15 K 0.532,141 C 0.547,177 N 0.551,294 R 0.510,299 W 0.517,363 R 0.524,423 D 0.501
p1 = 0.131ω = 31.804
M78−5789.50 p=0.716 q=1.590 Not Allowed

3.8. Enzyme Activity Assays of MlongTPS29

Limonene is an important precursor of the essential oil components of the genus Mentha, whose synthesis is catalyzed by limonene synthase (LS). In order to identify the candidate LS in M. longifolia genome sequence, LSs of M. spicata and M. piperita were used as queries to BLAST in M. longifolia TPSs. As a result, a candidate LS-coding gene, MlongTPS29, was identified in M. longifolia genome sequence. The coding sequence of MlongTPS29 is 1800 bp, which is the same as that for the LS homologs in M. spicata and M. piperita. Multiple sequence alignment also showed that MlongTPS29 was considerably similar to the LS of M. spicata and M. piperita (Figure S1). Both the sequence length and sequence similarity indicate that MlongTPS29 is complete. This gene was cloned and then subjected to assay its catalytic activity. The recombinant MlongTPS29 was heterologous expressed in E. coli and used to construct the reaction in vitro. After adding GPP as a substrate, GC-MS analysis showed that the limonene could be detected in the MlongTPS29 group, while no limonene was detected in the empty pET28a group (Figure 6). This result indicates that MlongTPS29 could catalyze the production of limonene from GPP.
Figure 6

GC-MS analysis of the products formed by recombinant MlongTPS29 proteins via in vitro assays. (A,B) Total ion current of products yielded by pET28a and pET28a-MlongTPS29, respectively. (C) Mass spectrum of the indicated peak.

4. Discussion

The genus Mentha has important economic value for its abundance of essential oils. The major constituents of mint essential oils are monoterpenes and sesquiterpenes [18,19]. Mentha plants (especially peppermint and spearmint) have been employed as model systems for the study of monoterpene biosynthesis [20,21]. However, the complex polyploidy and lack of genomic information limited further study. Horse mint (M. longifolia) is a diploid ancestor species of the genus Mentha, which has been developed as a model species for mint genomics [22]. The completion of M. longifolia genome sequencing provides opportunity to perform functional genomic studies of Mentha plants [23]. In this study, the TPS gene family, which is positioned at the branch point and is a key enzyme for terpenoid biosynthesis, was genome-widely identified and analyzed in M. longifolia genome sequence assembly. A total of 63 complete TPS genes were identified in the M. longifolia genome sequence assembly according to the conserved N-terminal and C-terminal domains of TPS. TPS belongs to a medium-sized gene family, with various gene numbers (approximately 20-150) among different plants [12]. The number of TPS genes in M. longifolia genome sequence assembly is moderate when compared to that of other reported plants. According to the phylogenetic analysis, TPSs of M. longifolia fall into six known angiosperm TPS subfamilies (TPS-a, TPS-b, TPS-c, TPS-e, TPS-f, and TPS-g). No gymnosperm-specific TPS-d subfamily or S. moellendorffii-specific TPS-h subfamily genes were identified. However, recent studies indicated that the TPS-d subfamily is not gymnosperm-specific, it was also found in Ananas comosus and Marchantia polymorpha [13]. TPS-b is the largest subfamily in M. longifolia genome sequence, and it has more members than the TPS-a subfamily (34.9%TPS-b genes and 20.6% TPS-a genes). This is in contrast to most other plants, such as A. thaliana (18.8% TPS-b genes and 68.8% TPS-a genes) [50], Vitis vinifera (29.0% TPS-b genes and 43.5% TPS-a genes) [46], and Oryza sativa (5.0% TPS-b genes and 62.5% TPS-a genes) [13]. The genomic distribution analysis showed that there were some tandem duplicates and segment duplicates in TPS-b genes, which might be the cause of the increase in the number of TPS-b subfamily genes in M. longifolia genome sequence [13]. The TPS-b subfamily is mainly responsible for catalyzing the biosynthesis of monoterpenoids, and monoterpenoids are the main components of the essential oils of Mentha plants [1,18]. Therefore, we speculate that the expansion of the TPS-b subfamily of Mentha may be related to the rich monoterpenoid content. Another interesting phenomenon is that there are 18 TPS-e subfamily genes in M. longifolia genome sequence, which is much higher than that of most other plants. It is worth noting that most TPS-e genes (15 of 18) are distributed on scaffold9, and tandem duplicates also exist in this subfamily. Whether the species-specific expansion of TPS-e in M. longifolia causes functional differentiation remains unclear. The integrated chemical-genomic-phylogenetic approach in Lamiaceae revealed that gene family expansion rather than increasing the enzyme promiscuity of terpene synthase is correlated with mono- and sesquiterpene diversity [51]. GC-MS analysis showed that the diversity of mono- and sesquiterpene in the genus Mentha was more abundant than that in other genera of Lamiaceae [51]. The catalytic function of the expanded TPS-e subfamily needs further investigation. The TPS genes could also been classified into different classes according to their genomic structure, including class I (13-15 exons), class II (10 exons), and class III (7 exons), which appear to have evolved sequentially from class I to class III [16]. Class I TPSs consist primarily of diterpene synthases found in gymnosperms (secondary metabolism) and angiosperms (primary metabolism). Class II TPSs evolved from class I by loss of the conifer diterpene internal sequence domain. Class III TPSs consist of angiosperm monoterpene, sesquiterpene, and diterpene synthases involved in the secondary metabolism, which evolved from Class II by loss of introns [16]. There are differences in gene structure between different subfamilies, while members of the same subfamily show minor differences. TPS-a, TPS-b, and TPS-g subfamilies with 6 to 8 exons belong to class III TPS, while TPS-c, TPS-e and TPS-f with 13 to 15 exons belong to class I TPS. In M. longifolia genome sequence, the gene structure of TPS is basically consistent with the subfamily classification, except for TPS-e. By comparing TPS-e genes with other plants, it was observed that some M. longifolia TPS-e genes have a loss of exons in the 5′-terminal. It has been suggested that during the evolutionary process, class I TPS genes will loss exons and introns successively to form a new class, so we speculate that these exon-losing TPS genes may be involved in this evolutionary process. Whether this exon deletion affects its function remains unclear. The main components of essential oils of Mentha plants are monoterpenoids, which are mainly catalyzed by the TPS-b subfamily. In this study, we selected the MlongTPS29, a putative limonene synthase encoding genes belonged to the TPS-b subfamily, for catalytic activity analysis. Limonene is the most important precursor of the essential oil components of the genus Mentha, which is catalyzed by limonene synthase. In peppermint and spearmint (two widely cultivated Mentha plants), the limonene synthase has been identified and shown to catalyze the synthesis of limonene from GPP [52]. The results of our study indicate that MlongTPS29 could also catalyze the production of limonene from GPP in vitro.

5. Conclusions

In this study, we genome-widely identified and analyzed the TPS gene family in M. longifolia genome sequence assembly, a model plant for functional genomic research in the genus Mentha. A total of 63 TPS genes were identified in the M. longifolia genome sequence, which could be divided into six subfamilies. The TPS-e subfamily had 18 members and showed a significant species-specific expansion compared with other plants. The 63 TPS genes could be mapped to nine scaffolds of M. longifolia genome sequence assembly, and the tandem duplicates and fragment duplicates contributed greatly to the increase in the number of TPS genes. The conserved motifs of M. longifolia TPSs were significantly differentiated between different subfamilies. Adaptive evolution analysis showed that M. longifolia TPSs were subjected to purifying selection after the species-specific expansion, and some amino acid residues under positive selection were identified. We also cloned a TPS-b gene, MlongTPS29, which could encode a limonene synthase and catalyze the biosynthesis of limonene, an important precursor of essential oils from the genus Mentha. This study provides useful information for the biosynthesis of terpenoids in the genus Mentha.
  51 in total

1.  MUSCLE: multiple sequence alignment with high accuracy and high throughput.

Authors:  Robert C Edgar
Journal:  Nucleic Acids Res       Date:  2004-03-19       Impact factor: 16.971

2.  Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene.

Authors:  R Nielsen; Z Yang
Journal:  Genetics       Date:  1998-03       Impact factor: 4.562

3.  Genomic organization of plant terpene synthases and molecular evolutionary implications.

Authors:  S C Trapp; R B Croteau
Journal:  Genetics       Date:  2001-06       Impact factor: 4.562

4.  MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms.

Authors:  Sudhir Kumar; Glen Stecher; Michael Li; Christina Knyaz; Koichiro Tamura
Journal:  Mol Biol Evol       Date:  2018-06-01       Impact factor: 16.240

5.  Genomic analysis of the terpenoid synthase ( AtTPS) gene family of Arabidopsis thaliana.

Authors:  S Aubourg; A Lecharny; J Bohlmann
Journal:  Mol Genet Genomics       Date:  2002-06-29       Impact factor: 3.291

Review 6.  Terpenoid biosynthesis and specialized vascular cells of conifer defense.

Authors:  Katherine G Zulak; Jörg Bohlmann
Journal:  J Integr Plant Biol       Date:  2010-01       Impact factor: 7.061

7.  Functional annotation, genome organization and phylogeny of the grapevine (Vitis vinifera) terpene synthase gene family based on genome assembly, FLcDNA cloning, and enzyme assays.

Authors:  Diane M Martin; Sébastien Aubourg; Marina B Schouwey; Laurent Daviet; Michel Schalk; Omid Toub; Steven T Lund; Jörg Bohlmann
Journal:  BMC Plant Biol       Date:  2010-10-21       Impact factor: 4.215

8.  EasyCodeML: A visual tool for analysis of selection using CodeML.

Authors:  Fangluan Gao; Chengjie Chen; Daej A Arab; Zhenguo Du; Yehua He; Simon Y W Ho
Journal:  Ecol Evol       Date:  2019-03-01       Impact factor: 2.912

9.  High-quality assembly of the reference genome for scarlet sage, Salvia splendens, an economically important ornamental plant.

Authors:  Ai-Xiang Dong; Hai-Bo Xin; Zi-Jing Li; Hui Liu; Yan-Qiang Sun; Shuai Nie; Zheng-Nan Zhao; Rong-Feng Cui; Ren-Gang Zhang; Quan-Zheng Yun; Xin-Ning Wang; Fatemeh Maghuly; Ilga Porth; Ri-Chen Cong; Jian-Feng Mao
Journal:  Gigascience       Date:  2018-07-01       Impact factor: 6.524

10.  The Pfam protein families database in 2019.

Authors:  Sara El-Gebali; Jaina Mistry; Alex Bateman; Sean R Eddy; Aurélien Luciani; Simon C Potter; Matloob Qureshi; Lorna J Richardson; Gustavo A Salazar; Alfredo Smart; Erik L L Sonnhammer; Layla Hirsh; Lisanna Paladin; Damiano Piovesan; Silvio C E Tosatto; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

View more
  3 in total

1.  Cloning and Characterization of 1,8-Cineole Synthase (SgCINS) Gene From the Leaves of Salvia guaranitica Plant.

Authors:  Mohammed Ali; Dikhnah Alshehri; Abeer Mousa Alkhaibari; Naeema A Elhalem; Doaa Bahaa Eldin Darwish
Journal:  Front Plant Sci       Date:  2022-04-15       Impact factor: 6.627

2.  Genome-wide identification and analysis of terpene synthase (TPS) genes in celery reveals their regulatory roles in terpenoid biosynthesis.

Authors:  Mengyao Li; Xiaoyan Li; Jin Zhou; Yue Sun; Jiageng Du; Zhuo Wang; Ya Luo; Yong Zhang; Qing Chen; Yan Wang; Yuanxiu Lin; Yunting Zhang; Wen He; Xiaorong Wang; Haoru Tang
Journal:  Front Plant Sci       Date:  2022-09-29       Impact factor: 6.627

3.  Correction: Chen et al. Genome-Wide Analysis of Terpene Synthase Gene Family in Mentha longifolia and Catalytic Activity Analysis of a Single Terpene Synthase. Genes 2021, 12, 518.

Authors:  Zequn Chen; Kelly J Vining; Xiwu Qi; Xu Yu; Ying Zheng; Zhiqi Liu; Hailing Fang; Li Li; Yang Bai; Chengyuan Liang; Weilin Li; Bernd Markus Lange
Journal:  Genes (Basel)       Date:  2021-12-28       Impact factor: 4.096

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.