Rui Cao1, Xinlong Wu2, Qi Wang1, Pengyan Qi1, Yuna Zhang1, Lizhi Wang1, Chao Sun3. 1. School of Chinese Materia Medica, Tianjin University of Traditional Chinese Medicine, Tianjin 301617, P. R. China. 2. College of Pharmaceutical Engineering of Traditional Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin 301617, P. R. China. 3. Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100193, P. R. China.
Abstract
Enzymes boost protein engineering, directed evolution, and the biochemical industry and are also the cornerstone of metabolic engineering. Basidiomycetes are known to produce a large variety of terpenoids with unique structures. However, basidiomycetous terpene synthases remain largely untapped. Therefore, we provide a modeling method to obtain specific terpene synthases. Aided by bioinformatics analysis, three γ-cadinene enzymes from Ganoderma lucidum and Ganoderma sinensis were accurately predicted and identified experimentally. Based on the highly conserved amino motifs of the characterized γ-cadinene enzymes, the enzyme was reassembled as model 1. Using this model as a template, 67 homologous sequences of the γ-cadinene enzyme were screened from the National Center for Biotechnology Information (NCBI). According to the 67 sequences, the same gene structure, and similar conserved motifs to model 1, the γ-cadinene enzyme model was further improved by the same construction method and renamed as model 2. The results of bioinformatics analysis show that the conservative regions of models 1 and 2 are highly similar. In addition, five of these sequences were verified, 100% of which were γ-cadinene enzymes. The accuracy of the prediction ability of the γ-cadinene enzyme model was proven. In the same way, we also reanalyzed the identified Δ6-protoilludene enzymes in fungi and (-)-α-bisabolol enzymes in plants, all of which have their own unique conserved motifs. Our research method is expected to be used to study other terpenoid synthases with a similar or the same function in basidiomycetes, ascomycetes, bacteria, and plants and to provide rich enzyme resources.
Enzymes boost protein engineering, directed evolution, and the biochemical industry and are also the cornerstone of metabolic engineering. Basidiomycetes are known to produce a large variety of terpenoids with unique structures. However, basidiomycetous terpene synthases remain largely untapped. Therefore, we provide a modeling method to obtain specific terpene synthases. Aided by bioinformatics analysis, three γ-cadinene enzymes from Ganoderma lucidum and Ganoderma sinensis were accurately predicted and identified experimentally. Based on the highly conserved amino motifs of the characterized γ-cadinene enzymes, the enzyme was reassembled as model 1. Using this model as a template, 67 homologous sequences of the γ-cadinene enzyme were screened from the National Center for Biotechnology Information (NCBI). According to the 67 sequences, the same gene structure, and similar conserved motifs to model 1, the γ-cadinene enzyme model was further improved by the same construction method and renamed as model 2. The results of bioinformatics analysis show that the conservative regions of models 1 and 2 are highly similar. In addition, five of these sequences were verified, 100% of which were γ-cadinene enzymes. The accuracy of the prediction ability of the γ-cadinene enzyme model was proven. In the same way, we also reanalyzed the identified Δ6-protoilludene enzymes in fungi and (-)-α-bisabolol enzymes in plants, all of which have their own unique conserved motifs. Our research method is expected to be used to study other terpenoid synthases with a similar or the same function in basidiomycetes, ascomycetes, bacteria, and plants and to provide rich enzyme resources.
Basidiomycetes are
a strong division of higher fungi because of
their 30000 + species richness and are the key players in the global
ecosystem carbon cycle and the production of small molecular bioactive
compounds.[1,2] The natural products of mushrooms have a
breathtaking structural diversity, especially sesquiterpenoids, diterpenoids,
and triterpenoids, which are formed by terpene synthases (TSs) catalyzing
precursors geranyl diphosphate (GPP), (2E,6E)-farnesyl diphosphate (FPP), and geranylgeranyl diphosphate
(GGPP) in a process of biosynthesis.[3−6] Subsequently, the modification of terpene
gene clusters further increases the structural diversity of terpenoids,
including many reported active natural products, such as the antimicrobial
compound lagopodin B from Coprinopsis cinerea,[7,8] antitumor compounds illudin M and S[9,10] of Omphalotus olearius and O. illudens, and antibiotic melleolides.[11,12] Terpenoids are natural products with extremely diversified structures,
but just a small level of them is explored thus far.[13]Sesquiterpenoids are the largest terpenoid group
in Basidiomycetes,
and a considerable number of compounds with chemical structures different
from those produced by other microbes have been identified. However,
because Basidiomycetes are difficult to culture under laboratory conditions
and because of the complexity of genetic domestication, compared with
plants, bacteria, and ascomycetes, there are few studies on the sesquiterpene
synthase (STS) biosynthesis pathway.[5,14−16] Each basidiomycete has an average of 10–20 hypothetical STS
homologues,[10] indicating that basidiomycetous
sesquiterpenoids and STSs represent rich but largely unexploited natural
resources. Over the past decade, with the continuous development of
sequencing technology and the emergence of a large amount of genomic
data, a certain basis for the study of STSs has been provided but
has also brought challenges for gene characterization. The groundbreaking
work, identification of five STSs (Cop1–4 and Cop6) from C. cinerea,[7] was carried
out in 2009. In 2012, the STSs (Omp1–10) from O. olearius and 40 available basidiomycete genomes
provided a prediction framework for sesquiterpene biosynthesis in
basidiomycetes, which can predict the products of the corresponding
cyclization mechanism according to the five classes of the phylogenetic
tree.[10] In 2013, the cloning and characterization
of STSs [Stehi1_64702 (now ShSTS15), Stehi1_73029 (now ShSTS16), and
Stehi1_25180 (now ShSTS18)] from S. hirsutum proved the accuracy of the prediction framework,[17] but this method cannot be used to predict the specific
structures of sesquiterpenoids. Until 2020, when STSs (Agr1-9) from Agrocybe aegerita were identified, sequence similarity
networks (SSNs) were established by using the Enzyme Function Initiative-Enzyme
Similarity Tool (EFI-EST, http://efi.igb.illinois.edu/efi-est/)[18] to probe similar or isofunctional
fungal STSs and products, which further broadened the knowledge of
basidiomycetous STSs, but Galma_104215 from Ganoderma
marginata and Pilcr_825684 from P.
croceum were not correctly predicted,[19] indicating that there are some limitations to predicting
the functions of STSs by SSNs. Therefore, this study puts forward
a new point of view to accurately identify the novel STSs we need
and provides some additions for the STS prediction framework.In the recently reported and sequenced genomes of G. lucidum(20) and G. sinensis,[21] we have
identified three highly selective γ-cadinene synthases (γ-CSs). Ganoderma, a world-famous medical macrofungus in Basidiomycetes,
has been used to treat a variety of diseases for more than 2000 years. Ganoderma is one of the most deeply studied medicinal model
organisms[22] and has also been used as bonsai,
which represents a great commercial value. Triterpenes and polysaccharides
are the most studied bioactive components in Ganoderma.[23] In addition, sesquiterpene oxygen
derivatives with antifungal activity have also been reported,[24] but there are few studies on Ganoderma sesquiterpenoids, which are worthy of further study. Three STSs
(GS11330, GS14272, and GS02363) from G. sinensis(25,26) and 1 STS (GL26009) from G. lucidum(27) have been identified previously. In
this study, three interesting STSs from their STS family with the
same gene structure and sequence characteristics were cloned and characterized
in Escherichia coli chassis strains.
The homology model of γ-CS was established according to the
conservatism of sequences, and five sequences verified the accuracy
of this model used to identify novel γ-CSs. When we used this
modeling method to analyze the identified Δ6-protoilludene
synthases[19] in Basidiomycetes and (−)-α-bisabolol
synthases (BOSs)[28] in plants, we found
that these highly conserved sequences can still be used to model and
discover more new and valuable enzymes and gene clusters. In the future,
this modeling method is expected to be extended to Basidiomycetes,
ascomycetes, plants, and bacteria, providing unlimited enzyme resources
for the production of target products and intermediates.
Results and Discussion
Identification
of γ-CSs in G. lucidum and G. sinensis
Over the
past decade, more than 80 kinds of TSs in Basidiomycetes have been
characterized.[1] By retrieving the literature
and searching the JGI Basidiomycetous genome database,[29] the number of Basidiomycetes has increased from
113 in 2014[14] to 605 thus far. The influx
of sequenced and annotated genes has brought new opportunities and
challenges to the research of natural products. In the early stage
of this study, by analyzing the genome sequencing data of G. lucidum and G. sinensis, we found two particularly interesting STS genes, GlSTS6 and GsSTS43,
whose exon number was the highest in the family, and the exon size
and splicing pattern of each gene were consistent (Table ). We preliminarily speculate
that the gene containing this nine-exon gene structure may have evolved
a more conserved similar function, so next, this study aimed to predict
its function by bioinformatics and carry out experimental characterization.
Table 1
Comparison of Exon/Intron Size in
the Subtree
exon1
intron1
exon2
intron2
exon3
intron3
exon4
intron4
exon5
intron5
exon6
intron6
exon7
intron7
exon8
intron8
exon9
length
CpSTS18
131
52
139
52
137
58
48
63
134
53
107
55
18
52
244
58
224
1625
STC9
131
57
139
53
137
48
48
50
134
48
107
56
18
49
244
55
122
1496
ShSTS5
140
55
139
57
137
59
48
48
134
54
107
58
18
52
244
57
86
1493
GsSTS45b
140
67
136
64
137
51
48
54
134
59
107
62
18
72
244
102
269
1764
GISTS6
140
80
136
66
137
51
48
62
134
49
107
62
18
55
244
76
296
1761
GsSTS43
140
80
136
66
137
53
48
63
134
46
107
62
18
56
244
85
293
1768
We sorted
out the STS genes of Basidiomycetes that had been characterized
in the last 10 years, marked their exon number, and found that the
average exon number was 5.53 (see the Supporting Information), indicating that nine exons were obviously dominant
in number and may have a specific relationship with function. The
phylogenetic tree of the characterized STS genes was constructed with
GlSTS6 and GsSTS43. The STS gene was in the same small branch as STC9,[30] ShSTS5, and CpSTS18[31] (see the Supporting Information), which
are all specific enzymes coding a single product of γ-cadinene.
Therefore, we further speculated that GlSTS6 and GsSTS43 may also
be γ-CSs. To improve the accuracy of prediction, we selected
genes containing γ-cadinene products and nine exons to reconstruct
the gene structure-phylogenetic tree (Figure A). The gene structures of GlSTS6 and GsSTS43
were highly similar to the gene structures of STC9, ShSTS5, and CpSTS18,
while PpSTS03,[32] Pilcr_825684,[19] and Omp5a/b[10] were
all γ-CSs, but their genetic relationships were distant, and
their gene structures were also different, which may result from their
nonspecific enzymes. Although Pro1[11] and
AcTPS7[33] have nine exons, the structure
and function of the genes are very different (see Figure A and the Supporting Information). The size comparison of exons 1–9
of GlSTS6, GsSTS43, STC9, ShSTS5, and CpSTS18 showed that exons 3–8
were the same except exons 1, 2, and 9, indicating that they had evolved
a neat gene structure. In addition, multiple sequence alignments were
carried out, and the highest sequence similarity was found to be 51.8%,
and the lowest was 3.8% (see the Supporting Information). Although the overall sequence similarity was not high, many motifs
were highly conserved (Figure B). Therefore, according to the high conservation of gene
structure and amino acid residue sites, we are more convinced that
GlSTS6 and GsSTS43 are γ-CSs.
Figure 1
Gene structure-phylogenetic tree and sequence
alignment of γ-CSs.
(A) Gene structure-phylogenetic tree of γ-CS sequences and nine-exon
genes. (B) Amino acid sequence alignments of six γ-CSs. Amino
acids range from low (blue) to high (red) conservation.
Gene structure-phylogenetic tree and sequence
alignment of γ-CSs.
(A) Gene structure-phylogenetic tree of γ-CS sequences and nine-exon
genes. (B) Amino acid sequence alignments of six γ-CSs. Amino
acids range from low (blue) to high (red) conservation.Through the extraction of RNA from G. lucidum and G. sinensis, GlSTS6 and GsSTS43
were cloned by cDNA, and pGEM T Easy-GlSTS6 and pGEM T Easy-GsSTS43
clone vectors and pET32a-GlSTS6 and pET32a-GsSTS43 fusion protein
expression vectors were all successfully constructed. The expression
of the soluble protein was low, and most of them formed inclusion
bodies (Figure A).
Since there is no market standard for γ-cadinene, we synthesized
the STC9 gene. The expression vectors of fusion proteins for pET32a-GlSTS6,
pET32a-GsSTS43, and pET32a-STC9 were transferred into the hosts of E. coli Rosetta (DE3), and the products were detected
by HS-SPME-GC-MS in the headspace of E. coli culture medium. The results showed that E. coli cells expressing GlSTS6 and GsSTS43 produced a single peak of γ-cadinene
(the MS spectra for γ-cadinene can be found in the Supporting Information), consistent with the
major product of STC9. There are the minor products of low response
value for STC9, together accounting for ≈3.4% of (relative
peak area) the total sesquiterpenes in the headspace of E. coli cultures expressing STC9. In addition, when
we cloned GsSTS45, a member of the STS family of G.
sinensis, two correct transcripts, GsSTS45a and GsSTS45b,
were also obtained by manual reannotation, in which GsSTS45b is a
functional single-product γ-CS and has gene structure and amino
acid residues similar to that of GlSTS6, GsSTS43, STC9, ShSTS5, and
CpSTS18 (Figure B,C).
Therefore, the function of genes can be initially predicted by bioinformatics
analysis, such as comparison of exon size, gene structure-phylogenetic
tree, sequence alignment, and so forth.
Figure 2
Characterization of γ-CSs
from G. lucidum and G. sinensis in E. coli. (A) Expression of various γ-CSs from G. lucidum and G. sinensis. Lane M, molecular
weight marker protein; S, soluble fraction of
total cell extracts; and I, insoluble fraction of total cell extracts.
(B) GC–MS chromatograms of cultural supernatants of strains
GlSTS6, GsSTS43, and GsSTS45b and the control strains pET32a _ctrl
and the positive STC9_posi. (C) Prediction and cloning of a gene in
the G. sinensis STS family. The predicted
gene was GsSTS45. The cloned and reannotated genes were GsSTS45a and
GsSTS45b, and only GsSTS45b revealed to be functional.
Characterization of γ-CSs
from G. lucidum and G. sinensis in E. coli. (A) Expression of various γ-CSs from G. lucidum and G. sinensis. Lane M, molecular
weight marker protein; S, soluble fraction of
total cell extracts; and I, insoluble fraction of total cell extracts.
(B) GC–MS chromatograms of cultural supernatants of strains
GlSTS6, GsSTS43, and GsSTS45b and the control strains pET32a _ctrl
and the positive STC9_posi. (C) Prediction and cloning of a gene in
the G. sinensis STS family. The predicted
gene was GsSTS45. The cloned and reannotated genes were GsSTS45a and
GsSTS45b, and only GsSTS45b revealed to be functional.
Homology Search for Unidentified γ-CSs in Basidiomycetes
To explore whether other highly conserved genes with this special
gene structure would have the same function in Basidiomycetes, we
established homology model 1 (see the Supporting Information) of γ-CS based on the conserved regions of
GlSTS6, GsSTS43, GsSTS45b, STC9, ShSTS5, and CpSTS18, with amino acid
residues with high similarity extracted and recombined into a new
γ-CS protein sequence. Using this sequence as a template for
protein BLASTA searching in the NCBI nonredundant protein sequence
(nr), 67 genes were obtained by manually deleting the sequences in
which exon introns were too long or too short or the number difference
and the lack of terpenoid-conserved domains DDXXD and NSE/DTE. Finally,
67 genes were obtained, which were distributed among 31 genera and
58 species (see the Supporting Information).To improve the accuracy of prediction model 1, we combine
the following three strategies: (1) comparison of gene structure,
(2) multiple sequence alignment, and (3) homologous modeling. First,
67 sequences were used to construct the gene structure-phylogenetic
tree, which was divided into approximately three classes, and as a
whole, the intron phase was conserved (see the Supporting Information). The gene structure of branch I is
the most similar, in which branch Ia is almost the species of Suillus,
which may contain rich γ-cadinene. The exon 9 length of branch
II is larger than the exon 9 length of branches I and III, indicating
that the C-terminus of branches I and III is truncated or that the
C-terminus of branch II is lengthened to varying degrees. To accurately
analyze their structure, we counted the number of bases for each exon
of each gene, and the average is as follows: exon1: 134 bp, exon2:
139 bp, exon3: 137 bp, exon4: 48 bp, exon5: 134 bp, exon6: 107 bp,
exon7: 18 bp, exon8: 244 bp, and exon9: 119 bp, among which most of
exon 9 was 74 bp (see the Supporting Information). The average value of exons 3–8 is the same as the average
value of homologous model 1, so we infer that the same splicing pattern
of exons 3–8 may be related to the function of STSs. Then,
a multisequence alignment of 67 amino acid sequences was carried out,
and most of the amino acid residues were found to be highly conserved.
In addition to the conserved motifs DDXXD and NSE/DTE shared by terpenoids,
they also contained many unique conserved motifs, especially upstream
and downstream of these two conserved motifs, such as FFXWAFSXDDLSDEGXLQXFP
and DXMTWPNDLCSFNKEQXDGDXQNLV, as well as several conserved regions,
such as FDXXAXLSFPDAD, PYAAMLXD, FIXXRR, and QGTVXWYYXSPRYF (see the Supporting Information). These conserved regions
are almost the same as homologous model 1. Based on this knowledge,
we established homology model 2, in which the amino acid residues
with high similarity in 67 sequences were extracted and reassembled
into a new protein sequence (Figure ). Our goal is to use this sequence and >80% of
the
highly conserved regions as templates to identify γ-CSs in Basidiomycetes
and to provide a new method for active search and accurate identification
of some important STSs.
Figure 3
Homology model 2 of γ-CS was constructed.
Model 2 based on
conservative regions of 67 sequences (see the Supporting Information) retrieved by model 1 (see the Supporting Information) in NCBI. Red represented
100% conservatism, and blue represents >80% conservatism of amino
acid.
Homology model 2 of γ-CS was constructed.
Model 2 based on
conservative regions of 67 sequences (see the Supporting Information) retrieved by model 1 (see the Supporting Information) in NCBI. Red represented
100% conservatism, and blue represents >80% conservatism of amino
acid.
Verification of Predictive
Ability for Homology Model 2
By establishing the connection
between sequence conservation and
products, it is particularly important for us to obtain more accurate
prediction models to explore or design enzymes with better yield,
activity, and selectivity. To test the prediction ability of homologous
model 2 from each small class, we manually selected five sequences
with high similarity to the conservative region of model 2 for functional
verification: Cligib1_1787513 from Infundibulicybe gibba, Dicsq1_63165
from Dichomitus squalens LYAD-421 SS1,
E4T56_gene18889 from Termitomyces sp. T112, Pisti1_26981 from Pisolithus tinctorius Marx 270, and Suifus1_441513
from Suillus fuscotomentosus (see the Supporting Information). Each sequence was chemically
synthesized and connected with pET32a vectors to construct a fusion
protein expression vector and cloned into E. coli chassis strains to express the product. Each protein was approximately
65 kDa, and the soluble protein content of Pisti1_26981 was the highest
(Figure A). Fortunately,
the E. coli clones expressing Cligib1_1787513,
Dicsq1_63165, E4T56_gene18889, Pisti1_26981, and Suifus1_441513 all
produced γ-cadinene (Figure B), which was consistent with STC9, ShSTS5, and CpSTS18.
Similar to GlSTS6, GsSTS43, and GsSTS45b, Dicsq1_63165 and Pisti1_26981
produced only a single γ-cadinene product, while Cligib1_1787513,
E4T56_gene18889, and Suifus1_441513, like STC9, produced the main
product γ-cadinene > 95%, as well as the two same small product
peaks, with a total content of <5%. The yield of Suifus1_441513
is 6.08 times higher than the yield of Dicsq1_63165 (Figure C), which proves the accuracy
of the prediction ability of homology model 2 constructed in this
experiment, and homology model 2 can also be used to find more active
enzymes. When we used homology model 2 to search homologous proteins
among 605 Basidiomycetes from the JGI Genome database, we found that
the sequences with gene structures similar to homology model 2 were
endless, which provided abundant candidate genes for accurate identification
of γ-CS. At the same time, our work method is also expected
to be used to find other TSs with potential application value and
better selectivity.
Figure 4
Verification of predictive ability for homology model
2, and γ-CSs
were selected according to a principle that the amino acid sequence
in each branch of the phylogenetic tree is most similar to the red
and blue amino acids of model 2. (A) Expression of five selected γ-CSs.
Lane M, molecular weight marker protein; S, soluble fraction of total
cell extracts; and I, insoluble fraction of total cell extracts. (B)
GC–MS chromatograms of cultural supernatants of five γ-CS
strains. (C) Production of γ-cadinene for five γ-CSs.
STC9 production of γ-cadinene is set to 100%; error bars indicate
standard deviations determined from triplicates.
Verification of predictive ability for homology model
2, and γ-CSs
were selected according to a principle that the amino acid sequence
in each branch of the phylogenetic tree is most similar to the red
and blue amino acids of model 2. (A) Expression of five selected γ-CSs.
Lane M, molecular weight marker protein; S, soluble fraction of total
cell extracts; and I, insoluble fraction of total cell extracts. (B)
GC–MS chromatograms of cultural supernatants of five γ-CS
strains. (C) Production of γ-cadinene for five γ-CSs.
STC9 production of γ-cadinene is set to 100%; error bars indicate
standard deviations determined from triplicates.
Study on the Functions of Other TSs
In previous research
work, the characterized TSs were used to predict the possible TS genes
in the sequenced and annotated genome databases, and then these genes
were directly cloned or synthesized after cyclization mechanisms,
or products of TSs were predicted by constructing evolutionary trees.
SSNs were constructed by using the Enzyme Function Initiative-Enzyme
Similarity Tool to predict or characterize new TSs, such as 15 characterized
Δ6-protoilludene synthases.[19] Novel enzymes with higher activity and yield were also found by
multiple sequence alignments, such as (−)-α-bisabolol
synthase,[28] or combining two strategies:
(1) full-sequence alignment and (2) comparison of predicted active
sites to identify linalool synthases.[34] One thing is in common in the discovery of all TSs: the conservation
of sequences is used to predict and identify TSs, which coincides
with the central point of view of this study. The establishment of
a highly conservative model can be used to find the required enzymes.
Therefore, we want to use this method to reanalyze some identified
TSs to establish the relationship between sequence conservation and
specific products.After 15 gene sequences of characterized
Δ6-protoilludene synthases cloned by cDNA or synthesized
by reannotating were rearranged and used to reconstruct the gene structure-phylogenetic
tree, we found that the splicing patterns of the other 14 genes were
similar except Pro1 (Figure A). Based on the intron/exon pattern, Pro1 may have evolved
from a distant ancestor plant terpene synthase containing 12 introns
and 13 exons.[35] Dia1, a Δ6-protoilludene synthase from Diaporthe sp. in ascomycetes, a cross20
phyla horizontal gene transfer event between Basidiomycota and Ascomycota
of BR109, has been identified.[36−39] The small Omp7 cluster is a gene duplication from
the large Omp6 cluster to improve the rate-limiting steps in the biosynthesis
of illudin compounds.[10] Then, through multiple
sequence alignments, we further found that 15 Δ6-protoilludene
synthases also contain many unique conserved motifs, such as RXGCDLMNLFFVXDEXXD
and GNDXXSYNXEQXRGDDXHN upstream and downstream of the traditional
conserved motif of terpene synthase, as well as CDFNLLASLAY, VVXQAXDR,
YXXXRRXTIGAKPSFA, and GLGNWVRANDXWSFESXRYF (see the Supporting Information), not one by one. Finally, the Δ6-protoilludene synthase homology model protein sequence containing
344 amino acids was established and assembled by using the method
of this study (Figure B). We can use this model to look for more Δ6-protoilludene
synthase candidate genes and corresponding gene clusters in the JGI
genome database, which can bring rich enzyme resources for the study
of biosynthesis pathways of illudin with antitumor and antimicrobial
effects and may even bring better titer intermediates.
Figure 5
Gene structure-phylogenetic
tree and homology model of Δ6-protoilludene synthases.
(A) Gene structure-phylogenetic
tree of 15 characterized Δ6-protoilludene synthases
(see the Supporting Information). (B) Homology
model of Δ6-protoilludene synthases. The modeling
method is the same as the γ-CS; red represents 100% conservatism,
and blue represents >80% conservatism of the amino acid.
Gene structure-phylogenetic
tree and homology model of Δ6-protoilludene synthases.
(A) Gene structure-phylogenetic
tree of 15 characterized Δ6-protoilludene synthases
(see the Supporting Information). (B) Homology
model of Δ6-protoilludene synthases. The modeling
method is the same as the γ-CS; red represents 100% conservatism,
and blue represents >80% conservatism of the amino acid.In addition, as one of the four stereoisomers of
α-bisabolol,
(+)-α-bisabolol,[40] (−)-α-bisabolol,[28,41−44] (+)-epi-α-bisabolol,[45] and (−)-epi-α-bisabolol,[46] (−)-α-bisabolol, which is a monocyclic
sesquiterpene alcohol, is highly valued and widely used as an active
ingredient in the cosmetics and pharmaceutical industries.[28,47] The productivity of (−)-α-bisabolol has been improved
by using engineered microbes and high-efficiency BOS for the industrial
production. To obtain BOS with better activity and selectivity, the
researcher combined three strategies: (1) use the identified enzyme
homology search, (2) build a phylogenetic tree to lock a smaller one,
and (3) multiple sequence alignments and finally obtained a higher
yield of CcBOS from Cynara cardunculus var. scolymus.[28] The identified process
for the new BOS in this study is similar to our research method. Therefore,
we can construct a BOS homology model to discover more novel and more
active BOS enzyme resources. Then, the BOS homology model (Figure B) was built by sequence
alignment of BOS (Figure A), and we can find that BOS also has unique motifs, such
as SIWGDCFL, DDXXDXYGXYEELXXFTXAXERWSIXCLDXXPEYMK, HKEEQER, and so
forth. At the same time, we can make use of the differences in the
conserved sites of amino acid sequences to study the key active sites
of the BOS enzyme and develop more selective and active BOS by site-directed
mutagenesis. Finally, more detailed information on the BOS homology
model will be provided, and combined with bioinformatics technology,
the method of BOS identification will also be simplified. The discovery
of a new type of BOS indicates that there may be more highly conserved
motifs of TS in the plant kingdom than in Basidiomycetes,[28] and further studies will be needed. Moreover,
the results also show that the method of this study is applicable
in the plant kingdom. In the future, with the increasing number of
TS characterizations, there are believed to be an increasing number
of homology models, which will make it easier for us to identify the
enzymes we need. At the same time, the increasing number of TS characterizations
can also promote the development of bioinformatics technology and
make the characterization of new TSs more convenient.
Figure 6
Sequence alignment and
homology model of BOSs. (A) Amino acid sequence
alignments of four BOSs. MrBOS, Matricaria recutita (AIG92846.1); AaBOS, Artemisia annua (AFV40969.1); EeBOS, E. erythropappus (DC) McLeisch (AYJ71561.1); and CcBOS, C. cardunculus var. scolymus (XP_024994640.1). Amino acids range from low (blue)
to high (red) conservation. (B) Homology model of BOSs. The modeling
method is the same as the γ-CS, red represents 100% conservatism.
Sequence alignment and
homology model of BOSs. (A) Amino acid sequence
alignments of four BOSs. MrBOS, Matricaria recutita (AIG92846.1); AaBOS, Artemisia annua (AFV40969.1); EeBOS, E. erythropappus (DC) McLeisch (AYJ71561.1); and CcBOS, C. cardunculus var. scolymus (XP_024994640.1). Amino acids range from low (blue)
to high (red) conservation. (B) Homology model of BOSs. The modeling
method is the same as the γ-CS, red represents 100% conservatism.
Conclusions
Through gene structure,
phylogenetic tree construction, and multisequence
alignment analysis, we predicted and screened three interesting genes
in G. lucidum and G.
sinensis for cloning and characterization. GC–MS
detection results showed that the three genes were consistent with
previous research results, all of which were highly specific γ-CSs.
Based on the high conservation of the sequences, we constructed the
γ-CS homology model 1 and established the one-to-one correspondence
between the conserved residue sites of the sequence and γ-cadinene
products. In order to make the verification method more convincing,
we used homologous model 1 to retrieve 61 highly conserved sequences
in NCBI, and based on the principle of being as similar as possible
to conservative residues, we selected five genes for functional verification,
100% of which were γ-CSs. This study is to hope that this method
of SEQUENCE homology modeling can be used to find more efficient enzymes,
such as BOS, Δ6-protoilludene synthase, and so forth.
We also hope that γ-CS can have new applications in future research
and encourage more scientific researchers to construct more other
sequence homology models to improve the productivity of enzymes.
Methods
Strains
and Growth Conditions
G. lucidum and G. sinense strain sources and
growth conditions were carried out as described previously.[20,21]E. coli DH5α was used to amplify
genes and clone plasmids. E. coli Rosetta
(DE3) was used to express proteins and produce sesquiterpenoids. Lysogeny
broth (LB) medium (10 g/L tryptone, 5 g/L yeast extract, and 10 g/L
NaCl) supplemented with the appropriate antibiotic(s) ampicillin (50
μg mL–1), chloramphenicol (25 μg mL–1), and kanamycin (30 μg mL–1) was used to culture the cells. Isopropyl-β-D-1-thiogalactopyranoside
(IPTG) induced the expression of genes involved in sesquiterpenoid
synthesis.
Gene Predictions and Selection
Family
STS gene annotation,
alignment, structure prediction, and manual correction were performed
as described previously.[20,21] Briefly, four software
programs were used for ab initio gene identification, including Augustus,
GeneMark, Fgenesh, and SNAP. STS gene predictions were manually corrected
using Apollo software. Two STS families were aligned with functionally
characterized basidiomycetous STSs, and a neighbor-joining phylogenetic
tree was subsequently produced using MEGAX.[48] Moreover, a gene structure-phylogenetic tree was constructed by
using GSDS2.0[49] (http://gsds.gao-lab.org/) to
select interesting genes.
mRNA Extraction and cDNA Preparation
The mycelium material
wrapped in tin foil from 7, 14, 21, and 28 days of plate culture was
frozen with liquid nitrogen and then stored in a −80 °C
refrigerator. Then, total RNA was extracted by using the Universal
Plant Total RNA Extraction Kit (Bioteke, Beijing) according to the
manufacturer’s procedures without DNase and further purified
using Recombinant DNase I alone (RNase-free, Takara) to remove genomic
DNA. Equal quantities of RNA at the four different ages were mixed
and then used for RT-PCR by using a PrimerScript II first Strand cDNA
Synthesis Kit (Takara). Single-stranded cDNA was synthesized utilizing
Oligo dT Primers.
Cloning of STS Genes
According to
the predicted candidate
gene sequences, three target-encoding sequences were amplified using
5′ and 3′ end-specific primers designed with restriction
sites (see the Supporting Information)
and Pyrobest DNA Polymerase (Takara) according to the manufacturer’s
procedures. Hard-to-obtain genes were further amplified using Q5 hot-activated
ultrafidelity DNA polymerases (BioLabs) to improve the efficiency
of amplification. The PCR products were purified, attached to the
pGEM-Teasy vectors (Promega), and transformed into competent E. coli DH5α cells (TIANGEN), which grew for
12–16 h on LB solid plates containing 50 μg/mL ampicillin
antibiotic at 37 °C. Through blue–white screening, at
least three positive clones of each gene were selected and sent to
GENEWIZ in Tian Jin for sequencing by the Sanger method. The sequencing
results were assembled with the predicted sequences by SeqMan software,
and Apollo software was used to manually correct the incomplete matching
sequences to obtain the correct open-reading frame obtained in the
experiment.
Heterologous Expression of STSs
The STS genes were
ligated to the pET32a (+) expression vectors through the restriction
sites. The successfully constructed plasmids were transformed into E. coli Rosetta (DE3) (TIANGEN) hosts to heterologously
express proteins. When the OD600 value of the bacterial
liquid reached 0.8, 0.5 mM (final concentration) IPTG was added to
the culture medium to induce transient protein expression at 18 °C
for 24 h. The pET32a (+) vectors with a thioredoxin tag can promote
STS expression and solubility. The Rosetta (DE3)[50] strain contains a pRARE plasmid, which can provide tRNA
of AUA, AGG, AGA, CUA, CCC, and GGA rare codons and enhance the expression
level of heterologous proteins.
GC–MS Analysis of
Terpenoids
After centrifugation
of 8 mL of culture, the upper medium was quickly placed in 20 mL SPME
Flasche (GERSTEL). The whole sampling process is automatically completed
by the MPS Multipurpose Sampler (GERSTEL). The program was set as
follows: incubation at 50 °C for 20 min, extraction for 15 min
by SPME fiber (50/30 μm DVB/CAR/PDMS, Supelco, USA), desorption
for 5 min, and cleanup for 3 min before and after injection. Volatile
terpenoids were analyzed by a 7890B-7000D (Agilent Technologies).
Samples were desorbed in a 250 °C injection port and eluted with
helium into an HP-5MS column (30 m × 250 μm × 0.25
μm) splitless. The following GC oven temperature program was
applied: 60 °C hold for 2 min, 10 °C/min ramped to 150 °C,
2 °C/min ramped to 160 °C, 15 °C/min ramped to 250
°C, and hold for 3 min (total GC program time: 25 min). Mass
spectra were scanned over the range of 30–500 Da for 300 ms.
The compounds were identified by electron ionization mode mass spectrometry
compared with the standard spectra of terpenoids in the National Institute
of Standards and Technology (NIST) database, mass spectra reported
in the literature, a C8–C20 alkane mix compared to the published
retention indices, and the characterized STC9.
Model Building and Homology
Search
The STSs from this
study were compared with the protein sequences of the same published
functional STS using CLC Genomics Workbench 12 software, and the conserved
regions were analyzed by a Clustal Omega Multiple Sequence Alignment
(https://www.ebi.ac.uk/Tools/msa/clustalo/) and modified by Jalview software. Then, the amino acid residues
with the highest frequency were recombined into a novel protein sequence.
Finally, more similar sequences were found in NCBI Protein BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi? PROGRAM=blastp&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome),
and the Joint Genome Institute under the Fungal Genomics Program (http://genome.jgi-psf.org/programs/fungi/index.jsf) was searched to verify the homologous model.