Literature DB >> 29339971

Terzyme: a tool for identification and analysis of the plant terpenome.

Piyush Priya1, Archana Yadav1, Jyoti Chand1, Gitanjali Yadav1,2.   

Abstract

BACKGROUND: Terpenoid hydrocarbons represent the largest and most ancient group of phytochemicals, such that the entire chemical library of a plant is often referred to as its 'terpenome'. Besides having numerous pharmacological properties, terpenes contribute to the scent of the rose, the flavors of cinnamon and the yellow of sunflowers. Rapidly increasing -omics datasets provide an unprecedented opportunity for terpenome detection, paving the way for automated web resources dedicated to phytochemical predictions in genomic data.
RESULTS: We have developed Terzyme, a predictive algorithm for identification, classification and assignment of broad substrate unit to terpene synthase (TPS) and prenyl transferase (PT) enzymes, known to generate the enormous structural and functional diversity of terpenoid compounds across the plant kingdom. Terzyme uses sequence information, plant taxonomy and machine learning methods for predicting TPSs and PTs in genome and proteome datasets. We demonstrate a significant enrichment of the currently identified terpenome by running Terzyme on more than 40 plants.
CONCLUSIONS: Terzyme is the result of a rigorous analysis of evolutionary relationships between hundreds of characterized sequences of TPSs and PTs with known specificities, followed by analysis of genome-wide gene distribution patterns, ontology based clustering and optimization of various parameters for building accurate profile Hidden Markov Models. The predictive webserver and database is freely available at http://nipgr.res.in/terzyme.html and would serve as a useful tool for deciphering the species-specific phytochemical potential of plant genomes.

Entities:  

Keywords:  GO clustering; Hidden Markov Models (HMM); Pathway mapping; Phytochemicals; Prenyl transferase (PT); Terpene synthase (TPS); Terpenome

Year:  2018        PMID: 29339971      PMCID: PMC5761147          DOI: 10.1186/s13007-017-0269-0

Source DB:  PubMed          Journal:  Plant Methods        ISSN: 1746-4811            Impact factor:   4.993


Background

Modern plants have adapted to the sessile nature of life on land by evolving mechanisms for chemical communication and defence, mediated via low molecular weight compounds, often with complex structures, which have the ability to function in diverse physiological, developmental and evolutionary processes [1]. These phytochemicals, grouped together as plant secondary metabolites, have diversified in both structure and function via gene duplications followed by sub-functionalisation and positive selection for metabolite expansion, such that each species has its unique arsenal of secondary metabolites, many of which are of great significance to humans [2, 3]. Isoprenoids or ‘terpenoids’ represent the largest, most ancient group of phytochemicals, and the entire chemical library of a plant is often referred to as the ‘terpenome’ [4]. Well-known terpenoids include citral, menthol, camphor, cannabinoids and the curcuminoids found in turmeric and mustard seeds. Biosynthesis of terpenes requires the condensation of universal precursor C5 isoprene units to form C15 or C20 prenyl diphosphates (PDPs), catalyzed by short chain prenyl transferase (PT) enzymes, followed by multi-step cyclization reactions catalysed by a huge family of unique enzymes called the terpene synthases (TPSs) [5, 6]. TPSs catalyze one of the most complex reactions known to chemistry and biology, wherein, hundreds of regio- and stereo-specific products can be made from a single substrate by binding and steering polyisoprene substrates through a precise, multistep cyclization cascade that is initiated by the propagation of a highly reactive carbocation [7, 8]. Both PTs and TPSs have a distinct ‘terpene fold’ composed largely of inert amino acids (aa) lining a central active site [9]. They can exhibit very high specificity in product formation with remarkable stereochemical precision [10], as well as huge chemical promiscuity [11]. Molecular investigations of TPSs are an active area of research from the perspective of metabolic engineering. TPSs have been identified and characterized in model plant species of commercial and agronomic value such as Arabidopsis thaliana [12], Citrus [13], Vitis vinifera [14] and Solanum lycopersicum [15], as well as in various gymnosperms [16-18]. The TPS gene family in plants reveals functional diversification with members showing clear divergence in different lineages despite similar sequences and structures [19]. This makes it quite challenging to assign substrate specificity to a newly annotated TPS sequence [20]. TPSs have been classified according to two major classification schemes; one based on their functional roles and product formation, whereas the other is based entirely on sequence homology. As per the former scheme, TPSs are divided into three subclasses, namely, monoterpene synthases (Mono-TPSs), sesquiterpene synthases (Sesqui-TPSs), and diterpene synthases (Di-TPSs), depending upon the number of isoprene units condensed by the enzyme, which may be two, three or four, respectively. In terms of protein length, Di-TPSs are the longest (> 850aa) as compared to monoterpene synthases (ranging from 600 to 650aa), and sesquiterpene synthases (between 550 and 580aa long), and this difference arises from an interspersed sequence element in Di-TPSs, conserved both in location and amino acid composition [21]. According to the second TPS classification scheme, seven families are recognized currently, from TPSa to TPSg, with the original clades of TPSe and TPSf merged into a single TPS-e/f subfamily [19, 20, 22]. Of these, the TPSc clade is proposed to be the most ancient, and contains mono- and bifunctional copalyl diphosphate synthase (CPS) proteins from gymnosperms as well as angiosperms. TPSd clade is specific to gymnosperms while the TPSa, b and g subfamilies are angiosperm-specific. TPS-e/f combines the sister subclade-e (representing kaurene synthase B) and its derivative subclade-f that contains linalool synthases, hypothesized to be dicot-specific [20, 22]. From a physiological viewpoint, TPSe and TPSc subfamily members including the (−)-CPS synthases are distantly related to primary metabolism while TPSa, b, and d gene subfamilies are involved in secondary metabolism and show greater diversification. In this work, we present a comprehensive attempt to identify and classify the PT and TPS gene families in 42 plant species for which nuclear genome sequence data is available in the public domain, leading to the development of Terzyme, an interactive online webserver and database for predictive identification and analysis of the plant terpenome. We also present a detailed computational analysis that was undertaken to assess TPS gene distribution patterns, domain organization and potential functional roles, in order to understand the evolution of novel biochemical functions in different lineages, and to unravel the complexity of the plant terpenome. Assessment of genome wide distribution patterns as well as clustering among the genes of the identified TPSs is important in view of the fact that plants are well known for the occurrence of both genic and chromosomal duplications that have resulted in the widespread existence of gene families in this kingdom, apart from being associated with subsequent evolutionary divergence via sub-functionalization or neo-functionalization [23]. We hope that Terzyme will provide insights into the concept of lineage-specific expansion in the PT and TPS families in various plant species together with their functional roles and to understand the evolution of terpene biosynthetic machinery in plants.

Results

Annotated terpenome data

The curated terpenome data was compiled as described in methods, involving retrieval of sequences from the NCBI Protein Database via keyword specific search for prenyl transferases (PTs) as well as all TPS functional classes (mono-, di- and sesqui-TPSs) as well as the sequence homology based gene-family classes, namely TPSa to TPSg [19]. The function-based (FB dataset) consisted of 401 representatives sequences, including 154 monoterpene synthases, 71 diterpene synthases and 176 sesquiterpene synthases, as shown in Table 1. These sequences represent diverse taxonomic classes of green plants, including land plants, which further include seed plants, with the exception of chlorophytes. For prenyl transferases, a total of 301 PT sequences were compiled as shown in the last column of Table 1, and this data was called the PT dataset. Mosses have not been reported to have any prenyl transferases at all. Additional file 1: Table 1 provides a detailed list of accession IDs for each sequence used in the FB and PT datasets. The gene-family based GB dataset was also compiled as described in methods and Table 2 shows the 326 sequences retrieved for this dataset. A detailed list of accession IDs for each sequence, along with species and sub-class information has been provided in Additional file 2: Table 2. As can be seen in Table 2, this dataset contains 113 TPSa, 49 TPSb, 35 TPSc, 48 TPSd, 50 TPSe_f and 31 TPSg sequences, mainly present in seed plants. No TPS has been annotated to date in chlorophytes. Lower plants like ferns and mosses also have very few PTs or TPSs. Similarly, genomes of ancient land plants like the magnoliales also appear to lack PTs or TPSs as per their current annotations. As expected, only gymnosperm sequences are present in the TPSd subfamily, known to be specific to this clade. In all, the FB, GB and the PT datasets consist of 116, 74 and 112 species respectively, the majority being monocots or dicots.
Table 1

Data representing the known function-based TPS subfamily (FB) and prenyl transferase family dataset

Plant domain (#plants)MonoTPSsDiTPSsSesquiTPSsTotal (FB)PT (#plants)
Chlorophytes15 (7)
Bryophytes
Pteridophytes (1)4042 (1)
Gymnosperms (18)4926108514 (6)
Ancient Angiosperms (3)052072 (2)
Monocots (14)1116346129 (10)
Eudicots (80)8925130244239 (86)
Total (116)15471176401301 (112)

#Number of species per category provided in brackets. For details, see supplementary data

Table 2

TPS data representing the known gene family based (GB) dataset

TPSaTPSbTPScTPSdTPSe_fTPSgTotal
Chlorophytes (0)
Bryophytes (1)0101
Pteridophytes (1)0303
Gymnosperms (8)02480252
Monocots (13)34408170366
Eudicots (51)7945213128204
Total (74)1134935485031326

For details, see supplementary data

Data representing the known function-based TPS subfamily (FB) and prenyl transferase family dataset #Number of species per category provided in brackets. For details, see supplementary data TPS data representing the known gene family based (GB) dataset For details, see supplementary data

Profile hidden Markov models and predictive accuracy

Profile HMMs were built for the prenyl transferase family and all 12 classes of TPS subfamilies, as described in methods. For TPSs, analysis was divided into two parts, function based analysis and gene family based analysis. In function-based analysis the input sequence is classified into a monoterpene, diterpene or sesquiterpene synthase using the six profile HMMs specific to function based (FB) dataset. In gene family based (GB) analysis, the test sequence is assigned to one of the six gene families described earlier from TPSa to g, with HMMs being generated from the GB dataset. In order to test the predictive accuracy of the program, benchmarking was done as described in methods, and this revealed a sensitivity of 100% in all cases, albeit with a relatively low accuracy range of 51–61%. This indicates that although each individual sub-family search profile is able to successfully identify true positives, the twelve HMMs also have a tendency towards false positives, i.e. acquisition of TPS sequences from other sub-families. The high rate of false positives severely affects sub-family annotation towards prediction of substrate specificity and may be ascribed to the strong homology between various TPS sub-classes, both at sequence and structural levels. Table 3 provides a sense of this overlap through an inter-family sequence and structural fold comparison. The various TPS sub-family representatives used for structural superimposition included the 1,8-cineole synthase from Salvia fruticosa for Monoterpene synthases (PDBID 25JC), the Taxadiene synthase from Pacific yew for Di-TPSs (PDB ID 3P5P) and the 5-epi aristolochene synthase from Nicotiana tabacum (PDBID 3M00) representing sesqui-TPSs. As a result, correct annotation and classification of newly identified TPSs becomes a significant challenge in view of conservation between the different TPS superfamilies. In order to overcome this challenge posed by high false positives without losing out the perfect sensitivity achieved by each sub-family profile, we developed a pipeline wherein all twelve profile HMMs would be allowed to scan a new sequence in parallel. We then based the final sub-family assignment on the premise that high sensitivity of the true sub-family profile would overshadow the false positive scores of the remaining eleven non-self HMMs. In this manner, even though a given test sequence may be identified by multiple sub-family HMMs, the highest scoring hit would still remain the true sub-family profile. In order to test the veracity of our selection premise, all 12 HMMs were combined into a pipeline for scanning the test set dataset, followed by sorting based on highest score obtained by each sub-family profile. As expected, benchmarking of the program in this manner significantly enhanced the accuracy of the search algorithm as can be seen in Table 4. This table shows the results of predictive performance of HMMs calculated using the statistical concepts of sensitivity and accuracy as discussed in methodology section, before and after the parallel-scan strategy. As can be seen from Table 4, the final accuracy of the search algorithm increases to 100% for a majority of sub-families, with the exception of dicot-specific profiles for Mono- and Sesqui-TPSs, both of which show accuracy above 85%.
Table 3

Structural and sequence similarity between TPS sub-families

Unshaded cells represent average %Similarity between members of one subfamily with another while shaded cells represent structural overlap values with RMSD values in brackets. The PDB IDs used for inter-subfamily structural comparison are 25JC (Mono-TPS), 3P5P (Di-TPS) and 3M00 (Sesqui-TPS), representing the 1,8-cineole synthase from Salvia fruticosa, Taxadiene synthase from Pacific yew, and the 5-epi aristolochene synthase from Nicotiana tabacum respectively

Table 4

Predictive performance of the 12 profile HMMs (in %)

Category of terpenome prediction modelsSensitivityAccuracyAccuracy after combining HMMs
Monoterpene synthases in dicots (MonoD)1006188
Monoterpene synthases in monocots (MonoM)10051100
Diterpene synthases in dicots (DiD)10053100
Diterpene synthases in monocots (DiM)10051100
Sesquiterpene synthases in dicots (SesD)1006186
Sesquiterpene synthases in monocots (SesM)10052100
TPSa10060100
TPSb10055100
TPSc10052100
TPSd10054100
TPe_f10054100
TPSg10052100
Structural and sequence similarity between TPS sub-families Unshaded cells represent average %Similarity between members of one subfamily with another while shaded cells represent structural overlap values with RMSD values in brackets. The PDB IDs used for inter-subfamily structural comparison are 25JC (Mono-TPS), 3P5P (Di-TPS) and 3M00 (Sesqui-TPS), representing the 1,8-cineole synthase from Salvia fruticosa, Taxadiene synthase from Pacific yew, and the 5-epi aristolochene synthase from Nicotiana tabacum respectively Predictive performance of the 12 profile HMMs (in %) Encouraged by the superior predictive power of the search algorithm, we proceeded to assess its performance in context with other online web resources. To our knowledge, there is no search tool specific to any category of phytochemicals, but we expected the global annotation databases, (such as Pfam, PANTHER and Interpro) to be able to identify terpene synthase family, and therefore adequate for a comparative performance test. However, we found that none of the currently existing programs could classify TPSs either based on their function i.e. into monoterpene, diterpene or sesquiterpene synthases, nor on the basis of gene family (TPSa-TPSe/f). The Terzyme hidden markov models, in contrast, achieve taxon-based distinction between identified TPSs. One reason for the failure of general annotation databases may be that available programs like Pfam identify TPSs by detecting either the N-terminal domain naming them as Terpene_synth (PF01397) or the metal binding domain designating it as Terpene_synth_C (PF03936) or both. Similarly PANTHER detects N-terminal domain designating it as PTHR31376, other Terpenoid synthase as PTHR31225, PTHR11439 or PTHR31739. Interpro also detects Terpenoid synthase, N terminal domain as IPR001906 and C-terminal metal binding domain as Terpene_synth_C (IPR005630). Hence, from a predictive viewpoint, it can be inferred that our search algorithm performs better than existing programs for TPS gene family identification in the plant kingdom.

Novel terpenome identification

Encouraged by this superior accuracy of prediction, we automated the TPS and PT search pipeline to design and develop the Terzyme interactive online server, available freely without any login requirement at www.nipgr.res.in/terzyme.html. Figure 1 shows the query submission protocol of Terzyme. It has been configured to accept multiple fasta sequences to search for TPSs, and does not require download on local machines for processing. The Terzyme HMM pipeline was used to search for novel TPS and PT gene family members in the entire PSG dataset (containing 1,573,395 protein sequences from 42 plant genomes, as described in Methods), and the resulting identifications have been incorporated into the online web resource for browsing, download and further exploration. Figure 2 shows a few screenshots of the Terzyme prediction server including the browse-able terpenome database. In all, Terzyme identified a total of 3312 unique TPS sequences and 873 unique prenyl transferases. Some of the TPS sequences, as expected, were predicted by both function-based and gene-family based HMMs. These 3312 TPSs and 873 PTs are available for browsing, both by species name and taxonomic class, through the ‘Plant Genome Predictions’ menu of Terzyme website, as shown in panels B of Fig. 2. For each TPS, users can view or download the corresponding sequence in FASTA format, its alignment with the respective sub-family profile HMM, as well as the predicted secondary structure.
Fig. 1

The workflow of terpenome search algorithm

Fig. 2

Screenshots of the Terzyme Web resource developed for terpenome identification and annotation. a shows the browse-able list of 42 plant species on which the program was scanned. Clicking any of these 42 species links will return the outcome of Terzyme for that plant through a screen similar to b, which depicts the putative TPSs identified in the cold season food legume chickpea. Users can also submit their own sequence/s to the Terzyme prediction server and c shows a typical outcome for a query

The workflow of terpenome search algorithm Screenshots of the Terzyme Web resource developed for terpenome identification and annotation. a shows the browse-able list of 42 plant species on which the program was scanned. Clicking any of these 42 species links will return the outcome of Terzyme for that plant through a screen similar to b, which depicts the putative TPSs identified in the cold season food legume chickpea. Users can also submit their own sequence/s to the Terzyme prediction server and c shows a typical outcome for a query A total of 2040 TPSs were identified by the functional class based profile HMMs and these included 613 monoterpene synthases, 468 diterpene synthases and 959 sesquiterpene synthases. Similarly, 2987 TPSs were identified by gene family based profile HMMs which included 1797 TPSa, 432 TPSb, 218 TPSc, 81 TPSd, 270 TPSe_f and 189 TPSg gene family sequences. A complete list of these identifications along with sub-family assignment for each species, including PTs is provided in Tables 5, 6 and 7 respectively. Interestingly, our data shows identification of putative TPSs in three chlorophyte genomes even though none of the profile HMMs were trained on these species. A manual inspection of each sequence shows them to have sufficient length and presence of the requisite TPS motifs. From a functional point of view, all chlorophyte TPSs appear to be Diterpene synthases, the ancient TPS containing family, known previously to consist of both gymnosperm and angiosperm members. A detailed analysis of some of these DiTPSs indicated them to be closely related to Cycloartenol synthase and some were found to contain Squalene cyclase (SQCY) found in class II TPSs. Among bryophytes, Physcomitrella patens shows only one known bifunctional TPSs with both CPS/KS activity has been reported till date [24]. The present analysis reveals the additional presence of at least nine bifunctional TPSs, and we have identified 15 prenyl transferases in the moss genome, suggesting a reasonably large terpenome family with more than 30 members, majority of these being previously unreported diterpene synthases. The gene family profiles assign most of these sequences to the TPSa and TPSc subfamilies. As anticipated, 64 of the 72 putative TPSs identified in the gymnosperm genome (Picea abies) were assigned to TPSd subfamily, supporting the existing view that gymnosperm TPSs belong to a distinct clade [19]. Tables 5 and 6 also depict five gymnosperm TPS sequences representing the most ancient TPSc gene family, with mono/bifunctional CPSs, although we did not observe any bias of representation in case of functional class assignment; all three classes namely monoTPS, diTPS and sesqui-TPS are roughly equally present in this gymnosperm. It may be noted that Terzyme enables a distinction between Class II Copalyl diphosphate synthases (CPS) and the Class-I Kaurene synthases (KSL) also. Under the Plant Genome Predictions Tab, apart from the Diterpene TPS classification, for each genome, Terzyme shows the exact number of matches found for TPSc and TPSe_f classes, both of which represent largely, the CPS and KSL respectively. We believe this is a very useful feature that enables users to breakdown Di-TPS data for detecting better-resolved functional annotations. In general monoterpene and sesquiterpene synthases outnumber the diterpene synthases in all seed plant domains. This may be due to the general mono functional activity of the former compared to bifunctional activity of latter enzymes. Terpenes are known to play significant roles defence responses against herbivores by emissions of several volatile blends, and volatile emissions are mainly composed of monoterpenes and sesquiterpene lactones owing to their low molecular weights (C10 and C15 respectively). It may also be noted that the identified TPS gene family size increases from lower plants (chlorophytes, bryophytes and pteridophytes) to land plants, suggesting expansion of the family during course of evolution. An average of 30–50 TPS sequences were identified across higher plants, with the maximum number of sequences detected in Panicum virgatum (switchgrass), along with commercial fruit bearing dicot species like apple, grape and papaya. The Eucalyptus genome also contains over one hundred TPS sequences and it would be interesting to study these TPSs further and characterize their roles in the respective genomes. Present knowledge of completely characterized TPSs is limited to only few plant species and their classification based on functional roles or gene family is still an emerging field. Our data on the other hand opens up a huge repertoire of putative TPSs candidates throughout the plant kingdom, together with their functional, and gene family based classification. For example, the tomato terpenome was recently characterized with about 40 TPS [15], whereas our analysis reveals at least 60–100 TPSs in the S. lycopersicum genome along with more than 20 PTs. In addition, Diterpene TPS classification has been performed based on class I (Aspartate rich motif (DDXXD/E) or Non-aspartate rich consensus motif of (N,D)D(L,I,V)X(S,T)XXXE also called as ‘NSE/DTE’ motif and class II (DXDD) signature motifs present in the respective sequences. In house perl script were used to scan all the DiTPS annotations to assess the presence of these signature motifs for Class I and Class II TPSs. Accordingly, TERZYME classifies DiTPSs into the following four classes: (1) Class I DiTPS—If either Aspartate rich motif (DDXXD/E) or Non-aspartate rich consensus motif of (N,D)D(L,I,V)X(S,T)XXXE, (2) Class II Diterpene synthases—If DXDD motif was present, (3) bifunctional—If both Class I and Class II motifs are present and (4) Noncanonical : If none are present. Details can be seen in Table 8. The presence of these sub-classes in available plants can also be assessed on the Terzyme database under the Plant Genomes Predictions tab. These examples reflect how a rigorous scientific pursuit can lead to new annotations and gene discovery for previously unknown, and even well-known families of conserved sequences.
Table 5

Terpenome identified by Function

PlantMono TPSsDi TPSsSesTPSsTotal TPSs
Chlorophytes
 Chlamydomonas reinhardtii0202
 Volvox carteri0101
 Coccomyxa subellipsoidea0101
 Micromonas pusilla
 Ostreococcus lucimarinus
Bryophytes
 Physcomitrella patens1515
Pteridophytes
 Selaginella moellendorffii04400650
Gymnosperms
 Picea abies34171869
Monocots
 Poaceae
  Oryza sativa01273967
  Sorghum bicolor07052941
  Zea mays10172956
  Setaria italica05133149
  Panicum virgatum184156115
  Brachypodium distachyon03041320
Dicots
 Brassicaceae
  Arabidopsis lyrata12041733
  Arabidopsis thaliana11042540
  Brassica rapa09062641
  Capsella rubella07042435
  Thellungiella halophila03031420
 Malvaceae
  Gossypium raimondii31114890
  Theobroma cacao17092854
 Rosaceae
  Fragaria vesca13123459
  Malus domestica28244496
  Prunus persica11080726
 Rutaceae
  Citus sinensis33054482
  Citrus clementina08040113
 Solanaceae
  Solanum tuberosum09215383
  Solanum lycopersicum26062355
 Fabaceae
  Glycine max19111040
  Cicer arietinum08031021
  Medicago truncatula14151746
  Phaseolus vulgaris19042043
 Salicaceae
  Populus trichocarpa39113282
 Vitaceae
  Vitis vinifera260969104
 Euphorbiaceae
  Manihot esculenta14132754
  Ricinus communis24112156
 Cucurbitaceae
  Cucumus sativus13051735
 Caricaceae
  Carica papaya10101737
 Myrtaceae
  Eucalyptus grandis461058114
 Phrymaceae
  Mimulus guttatus15252565
 Ranunculaceae
  Aquilega coerulea41131165
 Linaceae
  Linum usitatissimum25241665
Total6134689592040
Table 6

Terpenome identified by gene family

PlantTPSaTPSbTPScTPSdTPSe_fTPSgTotal
Chlorophytes
 Chlamydomonas reinhardtii
 Volvox carteri
 Coccomyxa subellipsoidea
 Micromonas pusilla
 Ostreococcus lucimarinus
Bryophytes
 Physcomitrella patens060901020119
Pteridophytes
 Selaginella moellendorffii14043856
Gymnosperms
 Picea abies010105640172
Monocots
 Poaceae
  Oryza sativa360105150259
  Sorghum bicolor280403030543
  Zea mays23060603070449
  Setaria italica330405090354
  Panicum virgatum680823022005126
  Brachypodium distachyon3105040242
Dicots
 Brassicaceae
  Arabidopsis lyrata191001030235
  Arabidopsis thaliana290602020140
  Brassica rapa370703050153
  Capsella rubella380702030151
  Thellungiella halophila3902010244
 Malvaceae
  Gossypium raimondii46250602030183
  Theobroma cacao9615010505122
 Rosaceae
  Fragaria vesca9411090304121
  Malus domestica17418121420238
  Prunus persica8610020702107
 Rutaceae
  Citus sinensis6132040404105
  Citrus clementina21050501060543
 Solanaceae
  Solanum tuberosum720618020506109
  Solanum lycopersicum6720030505100
 Fabaceae
  Glycine max271008060758
  Cicer arietinum190204010834
  Medicago truncatula6009170509100
  Phaseolus vulgaris48100501010974
 Salicaceae
  Populus trichocarpa573504010503105
 Vitaceae
  Vitis vinifera7816040623127
 Euphorbiaceae
  Manihot esculenta611004130694
  Ricinus communis342501100575
 Cucurbitaceae
  Cucumus sativus350902020452
 Caricaceae
  Carica papaya11810031101143
 Myrtaceae
  Eucalyptus grandis8723020917138
 Phrymaceae
  Mimulus guttatus321317110679
 Ranunculaceae
  Aquilega coerulea203701070570
 Linaceae
  Linum usitatissimum162007170767
Total1797432218812701892987
Table 7

Terpenome identified through prenyl transferase (PT) annotation

PlantTotal PTSs
Chlorophytes
 Chlamydomonas reinhardtii06
 Volvox carteri05
 Coccomyxa subellipsoidea05
 Micromonas pusilla05
 Ostreococcus lucimarinus06
Bryophytes
 Physcomitrella patens15
Pteridophytes
 Selaginella moellendorffii16
Gymnosperms
 Picea abies14
Monocots
 Poaceae
  Oryza sativa22
  Sorghum bicolor09
  Zea mays37
  Setaria italica22
  Panicum virgatum40
  Brachypodium distachyon20
Dicots
 Brassicaceae
  Arabidopsis lyrata20
  Arabidopsis thaliana23
  Brassica rapa26
  Capsella rubella20
  Thellungiella halophila19
 Malvaceae
  Gossypium raimondii53
  Theobroma cacao28
 Rosaceae
  Fragaria vesca11
  Malus domestica41
  Prunus persica11
 Rutaceae
  Citus sinensis38
  Citrus clementina28
 Solanaceae
  Solanum tuberosum31
  Solanum lycopersicum21
 Fabaceae
  Glycine max31
  Cicer arietinum13
  Medicago truncatula10
  Phaseolus vulgaris14
 Salicaceae
  Populus trichocarpa47
 Vitaceae
  Vitis vinifera11
 Euphorbiaceae
  Manihot esculenta15
  Ricinus communis10
 Cucurbitaceae
  Cucumus sativus23
 Caricaceae
  Carica papaya11
 Myrtaceae
  Eucalyptus grandis22
 Phrymaceae
  Mimulus guttatus18
 Ranunculaceae
  Aquilega coerulea29
 Linaceae
  Linum usitatissimum27
Total873
Table 8

Diterpene classification based on Class I and Class II signature motifs

PlantClass IClass IIBifunctionalUnclassified/partialTotal DiTPSs
Chlorophytes
 Chlamydomonas reinhardtii0202
 Volvox carteri0101
 Coccomyxa subellipsoidea0101
 Micromonas pusilla
 Ostreococcus lucimarinus
Bryophytes
 Physcomitrella patens0304010315
Pteridophytes
 Selaginella moellendorffii2903040440
Gymnosperms
 Picea abies09020617
Monocots
 Poaceae
  Oryza sativa1504010727
  Sorghum bicolor03020106
  Zea mays0805030117
  Setaria italica0704010113
  Panicum virgatum18131041
  Brachypodium distachyon030104
Dicots
 Brassicaceae
  Arabidopsis lyrata030104
  Arabidopsis thaliana02010104
  Brassica rapa03010206
  Capsella rubella030104
  Thellungiella halophila020103
 Malvaceae
  Gossypium raimondii02020711
  Theobroma cacao05030109
 Rosaceae
  Fragaria vesca0203020512
  Malus domestica0503011524
  Prunus persica04010308
 Rutaceae
  Citus sinensis020103
  Citrus clementina02010205
 Solanaceae
  Solanum tuberosum03110721
  Solanum lycopersicum03010206
 Fabaceae
  Glycine max05030311
  Cicer arietinum01010103
  Medicago truncatula0302010915
  Phaseolus vulgaris01010204
 Salicaceae
  Populus trichocarpa06040111
 Vitaceae
  Vitis vinifera0401020209
 Euphorbiaceae
  Manihot esculenta09020213
  Ricinus communis08010211
 Cucurbitaceae
  Cucumus sativus040105
 Caricaceae
  Carica papaya05010410
 Myrtaceae
  Eucalyptus grandis08010110
 Phrymaceae
  Mimulus guttatus1007060225
 Ranunculaceae
  Aquilega coerulea100313
 Linaceae
  Linum usitatissimum1002021024
Total468
Terpenome identified by Function Terpenome identified by gene family Terpenome identified through prenyl transferase (PT) annotation Diterpene classification based on Class I and Class II signature motifs Gene ontology analysis for the newly identified TPS genes was carried out in order to further validate our predictions for the plant kingdom, as well as to improve the resolution of functional role prediction, in terms of molecular function or subcellular localization. As described in methods, a total of 2040 TPSs were subjected to ontological analysis and as anticipated, the novel TPSs were found to be enriched in biological process terms like ‘response to stress’, ‘lipid metabolic pathway’ and ‘secondary metabolic process’. More than 900 TPSs were found to be enriched for primary metabolism, although this class of genes is mostly known for secondary/ specialized metabolism. An assessment of these TPSs revealed that they belong to the mono/bifunctional CPS of TPSc and highly divergent TPSe_f gene families. Among molecular function categories, highest enrichment was found for terpene synthase activity, catalytic activity and magnesium ion binding activity, as expected, but in few cases, the GO terms were able to resolve the exact catalytic function for a given TPSs, as in case of hydrolases (37 cases), transferases (65 cases) or protein binding activities. In eight cases, nucleic acid binding was found to be an enriched term and we looked at these cases in more detail in order to understand how TPSs may bind to DNA/RNA to carry out their function. It was interesting to note that nucleic acid binding term was found mainly in case of sesquiterpene synthases in response towards oxidative stress. It may be noted that some TPS genes have previously been known to show single-stranded DNA endo-deoxyribonuclease activity or DNA-directed RNA polymerase activity, and take part in double strand break repair via homologues recombination. Thus, the present eight cases may form part of purine or pyrimidine nucleobase metabolic process. In the sub-cellular compartmentalization category, a majority of mono- and di-TPSs were found to be localized in plastids, as expected, since these are synthesized by the methyl erythritol (MEP) pathway, which is plastidial in nature. In contrast, the sesquiterpene synthases were predominantly found to be located in cytoplasm, the site of occurrence of the mevalonic acid (MVA) pathway that is known to synthesize sesquiterpene and triterpenes. 134 cases did not follow the expected localization trend, wherein a sesqui-TPS was localized to the plastid and, conversely a mono- or di-TPS was predicted to be localized in the cytoplasm, supporting the notion of crosstalk that has been previously hypothesized between MVA and MEP pathways [7]. In summary, the GO analysis further supports Terzyme predictions and the huge repertoire of new TPSs thus identified provides an opportunity for further functional characterization. Further analyses, as presented in the next section were performed to shed light on the roles and identities of these new TPSs.

Analysis of the plant terpenome

The newly identified TPS sequences were subjected to clustering, genome wide mapping and KEGG pathway analysis for understanding their evolution and also for assignment of substrate specificity, as described in Methods. Following GO assignment, the sequences with GO annotations were subjected to EC (Enzyme code) mapping and novel TPSs were mapped onto KEGG pathways, in order to assign putative catalytic roles. In this manner, 539 TPSs were assigned to specific enzymatic categories as shown in Table 9. These included 140 monoterpene synthases, 311 diterpene synthases and 88 sesquiterpene synthases. As can be seen from this table, ent-kaurene synthases were found in the highest number, followed by ent-copalyl diphosphate synthases, both DiTPSs. One each of bornyl-diphosphate synthase, levopimaradiene synthase and germacrene A synthase were predicted, these three representing one each of a MonoTPS, DiTPS and SesquiTPS respectively. Detailed information on accession IDs and substrate preferences for each of these 539 putative TPS sequences is provided in Additional file 3: Table 3. The IGMAP tool [25] was used for clustering the TPSs mapped to 19 plant genomes. Figure 3 depicts the genome-wide terpenome maps for selected monocots and a unicellular green alga, while Fig. 4 depicts the corresponding maps for selected dicotyledonous species. TPSs in both taxa can be observed in clusters often as large as 15 genes, with a tendency to be located towards the centromeres, as in case of rice and Arabidopsis, or towards the edges of the chromosomes, as in sorghum, maize, Brachypodium and most dicots. In Zea mays, TPS clusters map both towards center and towards the end of chromosomes. Statistical tests reinforced the trend observed in the map images, viz., a significant number of sequences representing TPS genes are located in clusters within the genomes analyzed, based on an unranked independent samples T-test at the 99% confidence level (P value = 5.52E−08). Clustering data is presented in Table 10, it shows positive correlation bewteen TPS gene family size and the corresponding number of gene clusters.
Table 9

Depiction of functional diversity of TPSs using KEGG module

Terpene synthasesEnzyme codes: activityTPSs mapped
Monoterpene synthases (TPSa and TPSg)EC 5.5.1.8 bornyldiphosphate synthase01
EC 4.2.3.25 (TPS14) (3S)-linalool synthase20
EC 4.2.3.15 myrcene/ocimene synthase45
EC 4.2.3.16 (4S)-limonene synthase27
EC 4.2.3.20 (R)-limonene synthase47
Diterpene synthases (TPSc and TPSb)EC 5.5.1.13 ent-copalyldiphosphate synthase83
EC 5.5.1.12 copalyldiphosphate synthase19
EC 5.5.1.14 syn-copalyl-diphosphate synthase26
EC 4.2.3.19 ent-kaurene synthase97
EC 4.2.3.28 ent-cassa-12, 15-diene synthase06
EC 4.2.3.29 ent-sandaracopimaradiene synthase12
EC 4.2.3.30 ent-pimara-8 (14), 15-diene synthase21
EC 4.2.3.18 abieta-7, 13-diene synthase03
EC 4.2.3.33 stemar-13-ene synthase16
EC 4.2.3.34 stemod-13 (17)-ene synthase18
EC 4.2.3.35 syn-pimara-7, 15-diene synthase09
EC 4.2.3.32 levopimaradiene synthase01
Sesquiterpene synthases (TPSb)EC 4.2.3.23 germacrene-A synthase01
EC 4.2.3.22 germacradienol synthase24
EC 4.2.3.21 vetispiradiene synthase18
EC 4.2.3.13 (+)-delta-cadinene synthase45
Total TPSs assigned functional roles539
Fig. 3

Genome-wide maps of TPSs identified in selected monocots and the chlorophyte Chlamydomonas reinhardtii. Clustering is evident in several cases, as discussed in text. All images generated using IGMAP [25] server

Fig. 4

Genome wide maps of TPSs in selected dicot species, discussion in text. Images generated using IGMAP [25] server, developed in our laboratory

Table 10

TPSs genes clustering across various genomes using IGMAP

PlantTotal TPSsTPS present within clustersClustered genes (%)
Monocots
 Poaceae
  Oryza sativa714867.6056338
  Sorghum bicolor463473.91304348
  Zea mays471736.17021277
  Brachypodium distachyon231356.52173913
Eudicots
 Brassicaceae
  Arabidopsis thaliana422764.28571429
  Brassica rapa472553.19148936
 Malvaceae
  Gossypium raimondii766281.57894737
 Rosaceae
  Fragaria vesca604676.66666667
 Solanaceae
  Solanum lycopersicum634673.01587302
  Solanum tuberosum897685.39325843
 Fabaceae
  Glycine max452453.33333333
  Cicer arietinum (desi)2528.00
  Medicago truncatula523159.61538462
  Phaseolus vulgaris494183.67346939
 Salicaceae
  Populus trichocarpa653553.84615385
 Vitaceae
  Vitis vinifera1188370.33898305
Clustering significance (unranked T-test)P < 5.52302E − 08
Depiction of functional diversity of TPSs using KEGG module Genome-wide maps of TPSs identified in selected monocots and the chlorophyte Chlamydomonas reinhardtii. Clustering is evident in several cases, as discussed in text. All images generated using IGMAP [25] server Genome wide maps of TPSs in selected dicot species, discussion in text. Images generated using IGMAP [25] server, developed in our laboratory TPSs genes clustering across various genomes using IGMAP

Discussion

Plant essential oils are complex mixtures of volatile organic compounds, which play indispensable roles in communication, defense, and adaptive evolution. The complete chemical library produced by a plant is referred to as its terpenome. One way of measuring the terpenome is through knowledge-based prediction of the biosynthetic machinery that generates the enormous diversity of these hydrocarbons, and this method has gained popularity in recent years for various gene families, with the advent of large-scale genome sequencing technologies. In this work we have used this method to identify about 4000 putative PTs and TPSs, representing a huge expansion of the hitherto known plant terpenome. Specifically, 2132 and 2957 function based and gene family based TPSs as shown in Tables 5 and 6, of which, 3312 sequences were unique. These TPSs were assigned to various sub-classes and analysed further for functional role prediction, through large-scale genome wide mapping and clustering, GO enrichment and KEGG mapping, resulting in assignment of substrate or product specificity to more than 500 newly identified TPS sequences. Gene ontology results validated our predictions to a large extent and KEGG pathway analysis was used to identify potential catalytic roles and substrate preference for over 500 TPSs. The plant kingdom has been known for widespread occurrence of genome wide duplication events, leading to the evolution of biosynthetic modules and clustered organization of genes, that have been observed and reported for several major classes of plant based secondary metabolites. One of the first studies in this area reported the existence of operon-like clusters of terpene-biosynthetic pathway genes with characteristic modularity, physical clustering, and co-regulation, evident in cyanogenic glycosides of A. thaliana and avenacin triterpenoids in oats [26]. These reports prompted us to develop IGMAP, a novel computational platform for identification, clustering, and interactive mapping of genes, families, and duplications across genomes for annotated NGS data [25]. In this work, we used IGMAP to perform a large scale spatial cluster analysis of novel TPSs that were identified using TERZYME. Spatial cluster analysis via IGMAP enables the identification of clustered arrays of genes on respective chromosomes in various genomes, and whether or not such clustered spatial patterns of genomic positioning are conserved within and between species or taxa. This effort led to generation of genome wide maps of the identified terpenome, which, in turn revealed a significant tendency to cluster, with individual clusters ranging in size from as few as three to over a dozen terpene synthases, as can be seen in Figs. 3 and 4. Spatial cluster analysis also suggested revealed TPSs to be located near the ends of chromosomes or close to centromeric regions, supporting the concept of selective advantage for clustered genes in plants (Figs. 3, 4). Among eudicots, in family Brassicaceae, TPS clusters of two to three genes were present in A. thaliana while in Brassica rapa, TPS gene clusters comprise of two to five genes. Family Rosaceae represented by Fragaria vesca contains tandem clusters of TPSs both in frequency and size (2–7 genes). In Fabaceae however, fewer TPS clusters were observed, also reflected in the terpenome maps of Glycine max and Cicer arietinum, as compared to Medicago truncatula where more distinct gene clusters were found comprising of 2–4 genes. The largest gene clusters were found in case of Phaseolus vulgaris chromosome 11, consisting of about 15 genes. It would be interesting to study this cluster further and characterize it in order to check whether this cluster is involved in synthesis of a specific terpenoid compound. The Malvaceae represented by Gossypium raimondiii has large TPS clusters; located mainly in chromosomes 2, 6, 7, 9 and 11. Populus trichocarpa of family Salicaceae has large TPS clusters on chromosome 19. Further, Vitaceae represented by grape genome was also found to contain several tandem TPS clusters on chromosomes 9, 10, 12 and 19, with two to eight genes in each cluster. In family Solanaceae, both potato and tomato show large TPS clusters of upto 17 genes. Solanum tuberosum also has the highest percentage of TPSs genes (85%) present in clusters. Among monocots, Oryza sativa showed large TPS clusters in chromosome 2, 3 and 4, which were similar, both in context of numbers of clusters and its size as compared to eudicots. Overall, IGMAP [25] is able to identify tandem duplications, but we believe that a large number of segmental duplications would also be present in these plants, and methods to directly assess extent of segmental duplication would be very useful for such studies. The identified TPS clusters represent a new avenue of research and would serve as excellent models for studying genome plasticity or novel mechanisms of adaptive evolution. A recent report describing the detection of a sesterterpene biosynthetic repertoire in Brassicaceae through genome mining, offered to us an opportunity to test the performance of Terzyme, apart from statistical benchmarks shown in earlier sections. The published work included characterization of seven new TPS genes of which five are from Arabidopsis, one from Capsella rubella and one from Brassica oleracea [27]. Terzyme server was able to detect each of the seven STSs, and classified these as sesquiterpene synthases. Furthermore, six of these STSs (namely AT3G14490, AT3G14520, AT3G14520, AT3G32030, AT3G29410 and Carubv10016237m) are already present in our terpenome database, with the exception of the TPS from Brassica oleracea since Terzyme only contains data for complete genomes and the B. oleracea genome is not in the database. These results further support the suitability of Terzyme as a tool of choice for biologists working in the area of terpenome detection and analysis. As an interesting offshoot of this study, we are now integrating Terzyme data with existing species-specific transcriptome datasets and cis-regulome records, in order to construct gene regulatory networks that can shed further light on how the terpenome has expanded and evolved in various taxonomic groups (GY unpublished data). In addition, we recently compared the ‘potential’ terpenome (as predicted by Terzyme) with the ‘actual’ terpenome, integrating volatile compound emission data in conjunction with genomic data to understand how a plant creates the so-called final terpenome, specific to itself, and whether or not plants tap the complete potential for terpene biosynthesis at their disposal according to their genomes [28]. Comparison of actual terpenome with the potential terpenome, as performed in this study, revealed how plants modulate their TPSs expression based on condition or environment-specific needs.

Conclusions

In this work, we describe Terzyme, a new web resource for identification and classification of terpene synthases, towards prediction of TPS and prenyl transferase gene families in a plant genome, followed by a comprehensive large scale assessment of the identified terpenome, based on data from 42 available plant species with complete nuclear genomes. Terzyme represents a collection of profile Hidden Markov Models (HMMs) based on a rigorous analysis of characterized PTs and terpene synthases in plants. Available freely at www.nipgr.res.in/terzyme.html, it is an online, automated, and predictive search tool, for accurate identification and classification of plant terpene synthases and prenyl treansferases, both on the basis of their function and evolutionary relationships. Terzyme has been designed to accept EST input in addition to protein sequences, and this can assist researchers with preliminary annotation of newly emerging NGS data. The Terzyme website has a tutorial section on submission as well as exploratory analysis. Links to the PDB have been provided for all known 3D structures in the family. Over 3000 novel sequences have been identified in this work and analysed further for functional role prediction. The analyses include TPS identification, assignment to functional or gene family based classes, followed by genome wide mapping and clustering of selected novel TPSs. GO enrichment and KEGG mapping were also carried out, to enable assignment of exact catalytic function to more than 500 TPS sequences identified using Terzyme. Taken together, the present work enables future investigations into several other aspects of the terpenome like rational design or alteration of substrate preferences towards user-desired scent bouquets through genetic engineering. The idea of a potential terpenome, as described here, will aid in determination of the exact range of product complexity of terpenoid hydrocarbons that a given species may be capable of, thereby paving the way for use of plant-derived terpenoids in the development of new pharmaceuticals, and commercial compounds.

Methods

Data collection

For identification of the plant terpenome, full genome sequences and TPS protein sequences were collected from the NCBI Protein Database as it includes translations from annotated coding regions in GenBank, RefSeq and TPA, and also records from SwissProt, PIR, PRF, and PDB. TPS sequences were extracted using keyword searches specific to their functional classes. The first dataset comprised of curated sequences belonging to the three major functional classes of TPSs, namely the Monoterpene synthases, Diterpene synthases, and Sesquiterpene synthases. This dataset was called the curated function-based or FB dataset. For identification of the plant prenyl transferases, annotated PT protein sequences were extracted using keyword searches specific to their functional classes i.e. geranyl diphosphate synthase, geranyl geranyl diphosphate synthase and farnesyl diphosphate synthase. Apart from the functional classification (FB dataset) of TPSs, protein sequences were collected for each of the six homology based TPS classes, and this dataset was called the gene family based (GB) dataset. For data on six known GB classes, literature based annotation was used [15, 20]. FB dataset was further divided into two classes, representing monocots and dicots (since our preliminary analysis showed that the predictive performance of the gene family was better when separated into taxon based classes). These six classes of FB data, along with six classes of GB and one PT class, were used for generating 13 profile hidden markov models (HMMs). Protein sequence prediction data was downloaded for 41 sequenced plant genome projects from Phytozome v9.1. Also, for expansion of our analysis across the major taxonomic lineages of the plant kingdom, the available genome sequence of the gymnosperm Picea abies was downloaded from its project website ftp://plantgenie.org/Data/ConGenIE/Picea_abies/ [29]. This combined data representing 42 genomes was called the ‘Plant Sequenced Genome’ (PSG) dataset. In summary, the FB, PT and the GB datasets were used for training and testing the HMM based search algorithm, while the PSG dataset was used for identification of novel TPSs.

TPS search algorithm

Profile HMMs were built using HMMER (Version-HMMER-3.0) [30]. In all, thirteen profile HMMs were developed, one for prenyl transferases, six for function-based class in monocots and dicots and six based on gene family classes. For all the families, the default weighing method, i.e. the Henikoff position-based sequence-weighting scheme was used. Multiple alignments and phylogenetic reconstruction of the sequences in the FB and GB datasets were carried out using CLUSTALX. Structural superimpositions between 3D representatives of various TPS sub-classes was performed at the C-alpha carbon using CLICK tool [31].

Benchmarking of the program

For program testing and prediction accuracy, a positive test dataset was separated at the time of data collection, comprising 10% of both FB and GB datasets in order to have representatives from all six function based classes as well as the six gene-family based classes. In order to check the precision accuracy of the program, and more importantly its negative prediction ability, a negative dataset comprising closely related sequences was added to the test set for functional classification as well as gene family based classification. Thus the test data set comprised of both negative as well as positive sequences. The predictive performance of all twelve profile HMMs was tested using the statistical concepts of sensitivity and accuracy. Sensitivity measures the proportion of actual positives, which are correctly identified as such, and was calculated for each family as the ratio of true positives to combined true positives and false negatives. Accuracy estimates the overall proportion of true positives in the population. In addition to these parameters, benchmarking of the search algorithm also involved comparison of Terzyme performance with existing global annotation databases like PFAM, PANTHER and Interpro.

The Terzyme server

The HMMER based analysis pipeline developed for identification of TPSs as described above, was converted into a web server using HTML and back-end CGI coding. All new TPS and PT identifications were also incorporated into the same web resource, designated as Terzyme, available online freely to the scientific community at http://nipgr.res.in/terzyme.html. The Terzyme webserver is compatible across platforms, and has been tested on several browsers and platforms, including Safari, Firefox, Konqueror and IE on Macintosh, Linux as well as Windows workstations. Perl and shell scripts were used to sort the top hits based on highest score obtained from all 13 profile HMMs. If user input is EST data, an additional e-value filter of 0.01 is applied. The EST data is subjected to a six-frame translation using Transeq tool of EMBOSS. For protein sequence data, default e-value is used with all profile HMMs. Shell scripts have been incorporated to accept simultaneous requests from multiple users making Terzyme a more robust platform. To aid further confirmation of predictive results, a protein secondary structure prediction server has been incorporated into the web resource, based on PSIPRED 3.5 [32]. This server can handle multiple fasta queries for efficient performance. The backend of this server used UniRef90 [33] dataset for psiBLAST [34] within PSIPRED runs. The PSIPRED output gets color-coded according to predicted secondary structure elements through in house PERL scripts.

Analysis of the terpenome

The novel TPSs thus identified were used for downstream analysis of the plant terpenome. For this, we used IGMAP [25], a program developed earlier by our group for genome-wide mapping and clustering studies, including assessment of spatial patterns of newly identified TPSs on the respective genomes and to find out whether clustering patterns indeed exist in plant terpenomes. TPS sequences identified from the PSG dataset as described above, were analyzed further, to gain a better insight into their product complexity. This was done via GO enrichment analysis, phylogenetics and assessment of intron–exon patterns. For gene ontology studies, Blast2GO [35] tool was used to annotate the novel TPSs belonging to various functional classes. The main annotation pipeline of the tool consists of three sequential steps namely: blast, mapping and annotation. Blast2GO uses the Basic Local Alignment Search Tool (BLAST) to find sequences similar to a query set. Retrieval of GO terms associated to the hits obtained from a BLAST search followed this step. The step assigns evaluated set of GO annotations for the input query sequences. Interpro annotations in Blast2GO were used to retrieve domain/motif information for each sequence. In the final step, predicted TPS enzymes were mapped onto corresponding KEGG pathways [35] and enzyme codes were obtained by mapping from equivalent GO’s. This step led to the exploration of the functional diversity of predicted TPSs in terms of their catalytic activity and substrate preferences. Positive Dataset for training profile HMMs. A detailed list of accession IDs for each sequence used in the FB and PT datasets Positive Dataset for training profile HMMs. A detailed list of accession IDs for each sequence used in the GB dataset Detailed information on accession IDs and substrate preferences for 539 putative TPS sequences
  32 in total

1.  The PSIPRED protein structure prediction server.

Authors:  L J McGuffin; K Bryson; D T Jones
Journal:  Bioinformatics       Date:  2000-04       Impact factor: 6.937

2.  Enzyme Mechanisms for Polycyclic Triterpene Formation.

Authors: 
Journal:  Angew Chem Int Ed Engl       Date:  2000-08-18       Impact factor: 15.336

Review 3.  Secondary metabolic gene clusters: evolutionary toolkits for chemical innovation.

Authors:  Anne Osbourn
Journal:  Trends Genet       Date:  2010-08-24       Impact factor: 11.639

4.  Plant terpenoid synthases: molecular biology and phylogenetic analysis.

Authors:  J Bohlmann; G Meyer-Gauen; R Croteau
Journal:  Proc Natl Acad Sci U S A       Date:  1998-04-14       Impact factor: 11.205

Review 5.  The family of terpene synthases in plants: a mid-size family of genes for specialized metabolism that is highly diversified throughout the kingdom.

Authors:  Feng Chen; Dorothea Tholl; Jörg Bohlmann; Eran Pichersky
Journal:  Plant J       Date:  2011-04       Impact factor: 6.417

6.  Functional characterization of nine Norway Spruce TPS genes and evolution of gymnosperm terpene synthases of the TPS-d subfamily.

Authors:  Diane M Martin; Jenny Fäldt; Jörg Bohlmann
Journal:  Plant Physiol       Date:  2004-08-13       Impact factor: 8.340

7.  Genome organization in Arabidopsis thaliana: a survey for genes involved in isoprenoid and chlorophyll metabolism.

Authors:  B Markus Lange; Majid Ghassemian
Journal:  Plant Mol Biol       Date:  2003-04       Impact factor: 4.076

8.  Functional plasticity of paralogous diterpene synthases involved in conifer defense.

Authors:  Christopher I Keeling; Sabrina Weisshaar; Roy P C Lin; Jörg Bohlmann
Journal:  Proc Natl Acad Sci U S A       Date:  2008-01-15       Impact factor: 11.205

Review 9.  Managing and manipulating carbocations in biology: terpenoid cyclase structure and mechanism.

Authors:  C A Lesburg; J M Caruthers; C M Paschall; D W Christianson
Journal:  Curr Opin Struct Biol       Date:  1998-12       Impact factor: 6.809

10.  CLICK--topology-independent comparison of biomolecular 3D structures.

Authors:  M N Nguyen; K P Tan; M S Madhusudhan
Journal:  Nucleic Acids Res       Date:  2011-05-20       Impact factor: 16.971

View more
  9 in total

1.  Genome-wide detection and classification of terpene synthase genes in Aquilaria agallochum.

Authors:  Ankur Das; Khaleda Begum; Suraiya Akhtar; Raja Ahmed; Ram Kulkarni; Sofia Banu
Journal:  Physiol Mol Biol Plants       Date:  2021-08-05

2.  The Chromosome-Scale Assembly of the Curcuma alismatifolia Genome Provides Insight Into Anthocyanin and Terpenoid Biosynthesis.

Authors:  Qing Dong; Qing-Cheng Zou; Li-Hui Mao; Dan-Qing Tian; Wei Hu; Xue-Rui Cao; Hua-Qiao Ding
Journal:  Front Plant Sci       Date:  2022-06-15       Impact factor: 6.627

3.  Genome assembly of Chiococca alba uncovers key enzymes involved in the biosynthesis of unusual terpenoids.

Authors:  Kin H Lau; Wajid Waheed Bhat; John P Hamilton; Joshua C Wood; Brieanne Vaillancourt; Krystle Wiegert-Rininger; Linsey Newton; Britta Hamberger; Daniel Holmes; Bjoern Hamberger; C Robin Buell
Journal:  DNA Res       Date:  2020-06-01       Impact factor: 4.458

4.  Transcriptome Landscape Variation in the Genus Thymus.

Authors:  Aboozar Soorni; Tahereh Borna; Ali Alemardan; Manohar Chakrabarti; Arthur G Hunt; Aureliano Bombarely
Journal:  Genes (Basel)       Date:  2019-08-16       Impact factor: 4.096

Review 5.  Terpenes and Terpenoids in Plants: Interactions with Environment and Insects.

Authors:  Delbert Almerick T Boncan; Stacey S K Tsang; Chade Li; Ivy H T Lee; Hon-Ming Lam; Ting-Fung Chan; Jerome H L Hui
Journal:  Int J Mol Sci       Date:  2020-10-06       Impact factor: 5.923

6.  Integrating structure-based machine learning and co-evolution to investigate specificity in plant sesquiterpene synthases.

Authors:  Janani Durairaj; Elena Melillo; Harro J Bouwmeester; Jules Beekwilder; Dick de Ridder; Aalt D J van Dijk
Journal:  PLoS Comput Biol       Date:  2021-03-22       Impact factor: 4.475

7.  Genome-Wide Identification and Expression Analysis of Terpene Synthase Genes in Cymbidium faberi.

Authors:  Qian-Qian Wang; Meng-Jia Zhu; Xia Yu; Yuan-Yang Bi; Zhuang Zhou; Ming-Kun Chen; Jiating Chen; Diyang Zhang; Ye Ai; Zhong-Jian Liu; Siren Lan
Journal:  Front Plant Sci       Date:  2021-11-25       Impact factor: 5.753

8.  Metabolism and transcriptome profiling provides insight into the genes and transcription factors involved in monoterpene biosynthesis of borneol chemotype of Cinnamomum camphora induced by mechanical damage.

Authors:  Zerui Yang; Chunzhu Xie; Yuying Huang; Wenli An; Shanshan Liu; Song Huang; Xiasheng Zheng
Journal:  PeerJ       Date:  2021-07-01       Impact factor: 2.984

9.  Tissue-Specific Expression of the Terpene Synthase Family Genes in Rosa chinensis and Effect of Abiotic Stress Conditions.

Authors:  Yuhang Yan; Mouliang Li; Xiaoni Zhang; Weilong Kong; Mohammed Bendahmane; Manzhu Bao; Xiaopeng Fu
Journal:  Genes (Basel)       Date:  2022-03-20       Impact factor: 4.096

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.