Qi Zhang1, Xiao Yang, Huan Wang, Wilfred A van der Donk. 1. Department of Chemistry, Howard Hughes Medical Institute, University of Illinois at Urbana-Champaign , 600 South Mathews Avenue, Urbana, Illinois 61801, United States.
Abstract
Lanthionine-containing peptides (lanthipeptides) are a rapidly growing family of polycyclic peptide natural products belonging to the large class of ribosomally synthesized and post-translationally modified peptides (RiPPs). These compounds are widely distributed in taxonomically distant species, and their biosynthetic systems and biological activities are diverse. A unique example of lanthipeptide biosynthesis is the prochlorosin synthetase ProcM from the marine cyanobacterium Prochlorococcus MIT9313, which transforms up to 29 different precursor peptides (ProcAs) into a library of lanthipeptides called prochlorosins (Pcns) with highly diverse sequences and ring topologies. Here, we show that many ProcM-like enzymes from a variety of bacteria have the capacity to carry out post-translational modifications on highly diverse precursor peptides, providing new examples of natural combinatorial biosynthesis. We also demonstrate that the leader peptides come from different evolutionary origins, suggesting that the combinatorial biosynthesis is tied to the enzyme and not a specific type of leader peptide. For some precursor peptides encoded in the genomes, the leader peptides apparently have been truncated at the N-termini, and we show that these N-terminally truncated peptides are still substrates of the enzymes. Consistent with this hypothesis, we demonstrate that about two-thirds of the ProcA N-terminal sequence is not essential for ProcM activity. Our results also highlight the potential of exploring this class of natural products by genome mining and bioengineering.
Lanthionine-containing peptides (lanthipeptides) are a rapidly growing family of polycyclic peptide natural products belonging to the large class of ribosomally synthesized and post-translationally modified peptides (RiPPs). These compounds are widely distributed in taxonomically distant species, and their biosynthetic systems and biological activities are diverse. A unique example of lanthipeptide biosynthesis is the prochlorosin synthetase ProcM from the marine cyanobacteriumProchlorococcus MIT9313, which transforms up to 29 different precursor peptides (ProcAs) into a library of lanthipeptides called prochlorosins (Pcns) with highly diverse sequences and ring topologies. Here, we show that many ProcM-like enzymes from a variety of bacteria have the capacity to carry out post-translational modifications on highly diverse precursor peptides, providing new examples of natural combinatorial biosynthesis. We also demonstrate that the leader peptides come from different evolutionary origins, suggesting that the combinatorial biosynthesis is tied to the enzyme and not a specific type of leader peptide. For some precursor peptides encoded in the genomes, the leader peptides apparently have been truncated at the N-termini, and we show that these N-terminally truncated peptides are still substrates of the enzymes. Consistent with this hypothesis, we demonstrate that about two-thirds of the ProcA N-terminal sequence is not essential for ProcM activity. Our results also highlight the potential of exploring this class of natural products by genome mining and bioengineering.
Ribosomally
synthesized and
post-translationally modified peptides (RiPPs) are a major class of
natural products, as revealed by the genome sequencing efforts of
the past decade.[1] These compounds are produced
in all three domains of life and possess vast structural diversity.
Among the best-studied RiPPs are lanthipeptides, a class of compounds
that are distinguished by the presence of thioether cross-linked amino
acids named lanthionines and methyllanthionines.[2−7] Many lanthipeptides, such as the commercially used food preservative
nisin, have potent antimicrobial activity and are termed lantibiotics.[8,9] Lanthipeptides are widely distributed among taxonomically distant
species[10] and are currently grouped into
four distinct classes according to their biosynthetic machineries.[3,10] Like all RiPPs, lanthipeptides are generated from a linear precursor
peptide, which is generically termed LanA. This precursor peptide
consists of a C-terminal core peptide, where all post-translational
modifications take place, and an N-terminal leader peptide that is
important for post-translational modification and that is subsequently
removed by proteolysis (Figure 1).[1,11] The installation of the (methyl)lanthionine thioether bridges is
achieved by the initial dehydration of Ser and Thr residues in the
precursor peptides, followed by stereoselective intramolecular Michael-type
addition of Cys thiols to the newly formed dehydroamino acids (Figure 1).
Figure 1
Schematic representation of the biosynthetic pathway of
lanthipeptides
exemplified by prochlorosin 2.8. A shorthand notation for lanthionine
structures is shown in the box. Leader and core peptides are not shown
in proportion to their actual lengths.
Schematic representation of the biosynthetic pathway of
lanthipeptides
exemplified by prochlorosin 2.8. A shorthand notation for lanthionine
structures is shown in the box. Leader and core peptides are not shown
in proportion to their actual lengths.An intriguing example of a lanthipeptide synthetase is ProcM,
a
class II enzyme (generically termed LanM) from the planktonic marine
cyanobacteriumProchlorococcus MIT9313.[12] ProcM acts on up to 29 different precursor peptide
substrates (ProcAs) and produces a library of lanthipeptides termed
prochlorosins (Pcns) that possess highly diverse sequences and ring
topologies,[12,13] representing a remarkable example
of natural combinatorial biosynthesis. The biological role of Pcns
is currently elusive, but they are believed to be functional, as they
were found to be produced in the host strain, and their biosynthetic
genes were transcribed in response to changes in environmental conditions.[12]The intriguing combinatorial biosynthesis
of Pcns provides an interesting
model to investigate the evolution of natural product diversity and
the molecular origins for the remarkable substrate tolerance displayed
by the enzyme. The ProcA substrates have an unusually long leader
peptide compared to that of other lanthipeptide substrates, raising
the question of whether this longer leader peptide might be correlated
with the large diversity of substrates that ProcM processes. The ProcA
leader peptides are also unique in that they have sequence homology
with the Nif11 proteins.[14] The exact function
of the Nif11 proteins is not known, but they are thought to play a
role in nitrogen fixation, as their genes cluster with other nitrogen
fixation genes.[15] An alternative model
that has been proposed is that it is the cyclization active site of
ProcM that is unique and that confers the ability to cyclize a wide
variety of substrates. Here, we present bioinformatic and biochemical
investigations on lanthipeptide biosynthetic systems employing ProcM-like
enzymes. We show that the precursor peptides for lanthipeptide biosynthesis
are highly divergent among different biosynthetic systems and that
many ProcM-like lanthipeptide synthetases can engage in combinatorial
biosynthesis by tolerating precursor peptides with highly diverse
core sequences.
Results and Discussion
Genome Mining of LanAs
Associated with ProcM Analogues
ProcM catalyzes both dehydration
and cyclization reactions to transform
the linear ProcA peptides into a panel of Pcns. The enzyme contains
a conserved CCG motif, suggesting that it likely uses three Cys residues
for binding of an active site zinc ion,[10] unlike other lanthipeptide cyclases that utilize two Cys and one
His as zinc ligands. This zinc site has been shown to be important
for the cyclization reaction.[16,17] The presence of three
Cys ligands may account in part for the high substrate tolerance of
ProcM, as model studies have demonstrated that the reactivity of thiolate
nucleophiles ligated to Zn2+ is enhanced with an increased
number of thiolate ligands.[18] We previously
have shown that LanMs containing the CCG motif cluster together to
form a distinct subclade in the LanM phylogenetic tree, suggesting
that these ProcM-like enzymes evolved independently.[10] To investigate whether other ProcM-like enzymes also have
multiple structurally diverse LanA substrates, we performed a genome-wide
examination of the LanAs associated with ProcM analogues. This investigation
showed that similar lanthipeptide synthetases can have very diverse
precursor peptides, which vary significantly in both the number of
putative substrates and their amino acid sequence (Figure 2A and Supporting Information
Table 1 for a detailed list). Unlike ProcM, many ProcM analogues
have only a single LanA substrate, as found for most lanthipeptides.
On the other hand, many other ProcM-like enzymes have several substrates,
harboring either very similar or highly diverse core peptide sequences
(Figure 2A and Supporting
Information Table 1). A notable example is a ProcM analogue
from Prochlorococcus MIT9303, which
has almost the same sequence as ProcM (95% identity, 97% similarity).
This organism encodes 15 putative substrate peptides (Figure 2A), and none of these substrates have a core peptide
similar in sequence to the core peptides of the 29 ProcAs. This observation
raises the possibility that many, or all, Pcns may lack a biological
function and that they may represent intermediaries during evolution.
Conversely, these two lanthipeptide biosynthetic systems may represent
a remarkable example of convergent evolution of functional lanthipeptides,
if the structurally diverse peptides fulfill similar roles in the
closely related species.
Figure 2
Genome mining of precursor peptide genes associated
with ProcM-like
enzymes. (A) Bayesian MCMC phylogram of ProcM-like enzymes (protein
sequence) and a summary of the number of their putative LanA substrates.
The lacticin 481 synthetase LctM and nukacin synthetase NukM were
used as an outgroup for Bayesian MCMC analysis, which is shown as
an orange triangle. The detailed Bayesian MCMC tree is shown in Supporting Information Figure 29. The putative lanA genes were categorized into two groups based on whether
they are spatially close to their associated lanM genes. For LanAs that lack Cys residues, the substrates are shown
as total number of lanAs/the number of lanAs that do not code for Cys. If an enzyme had multiple LanA substrates,
then the core peptide sequences were aligned to examine whether these
precursor peptides are similar (S, for which the Ser/Thr and Cys residues
are aligned well) or diverse (D, for which Ser/Thr and Cys residues
did not align well). ProcM from Prochlorococcus MIT9313, NpnM from Nostoc punctiforme PCC 73102, and four LanMs (CyanM1–4) from Cyanothece sp. PCC 7425 are highlighted in red, blue,
and green, respectively. CyanM1 is highlighted by an asterisk. Three
groups of substrates shown in blue contain leader peptides that share
very weak similarities with the N11P family (1 × 10–4 < e-value < 0.1). NA indicates that the precursors do not
belong to any known protein family. (B) Sequence alignment of CyanA1.1–1.3,
showing that CyanA1.1 and Cyan1.3 may have been truncated at their
N-termini. Alternatively, the open reading frame (ORF) annotation
of CyanA1.2 could be incorrect, and its translation start codon may
instead be at the light brown arrow, like CyanA1.1 and 1.3. Completely
conserved and highly conserved residues in the leader peptides are
shown in black and gray boxes, respectively. Ser/Thr and Cys residues
in the core peptides are shown in blue and red boxes, respectively.
The proteolytic cleavage site is indicated by a green arrow. For detailed
information on precursor peptide sequence and the procedures for bioinformatics
analysis, see Supporting Information Table 1 and
Supporting Information Methods.
Genome mining of precursor peptide genes associated
with ProcM-like
enzymes. (A) Bayesian MCMC phylogram of ProcM-like enzymes (protein
sequence) and a summary of the number of their putative LanA substrates.
The lacticin 481 synthetase LctM and nukacin synthetase NukM were
used as an outgroup for Bayesian MCMC analysis, which is shown as
an orange triangle. The detailed Bayesian MCMC tree is shown in Supporting Information Figure 29. The putative lanA genes were categorized into two groups based on whether
they are spatially close to their associated lanM genes. For LanAs that lack Cys residues, the substrates are shown
as total number of lanAs/the number of lanAs that do not code for Cys. If an enzyme had multiple LanA substrates,
then the core peptide sequences were aligned to examine whether these
precursor peptides are similar (S, for which the Ser/Thr and Cys residues
are aligned well) or diverse (D, for which Ser/Thr and Cys residues
did not align well). ProcM from Prochlorococcus MIT9313, NpnM from Nostoc punctiforme PCC 73102, and four LanMs (CyanM1–4) from Cyanothece sp. PCC 7425 are highlighted in red, blue,
and green, respectively. CyanM1 is highlighted by an asterisk. Three
groups of substrates shown in blue contain leader peptides that share
very weak similarities with the N11P family (1 × 10–4 < e-value < 0.1). NA indicates that the precursors do not
belong to any known protein family. (B) Sequence alignment of CyanA1.1–1.3,
showing that CyanA1.1 and Cyan1.3 may have been truncated at their
N-termini. Alternatively, the open reading frame (ORF) annotation
of CyanA1.2 could be incorrect, and its translation start codon may
instead be at the light brown arrow, like CyanA1.1 and 1.3. Completely
conserved and highly conserved residues in the leader peptides are
shown in black and gray boxes, respectively. Ser/Thr and Cys residues
in the core peptides are shown in blue and red boxes, respectively.
The proteolytic cleavage site is indicated by a green arrow. For detailed
information on precursor peptide sequence and the procedures for bioinformatics
analysis, see Supporting Information Table 1 and
Supporting Information Methods.The vast majority of ProcM-like enzymes with the CCG motif
were
found in cyanobacteria, a rich source of RiPP natural products,[19] but they are also present in other phyla (Figure 2A). The precursor peptide genes associated with
these ProcM-like synthetases appear to have different evolutionary
origins. Many of the LanA leader peptides are members of the N11P
family, but several other leader peptides belong to the nitrile hydratase
leader peptide (NHLP) family (Figure 2A).[14] NHLP is highly similar to the α subunit
of nitrile hydratase (NHase) but lacks about 30 amino acids in the
middle of the NHase sequence that are important for binding of a catalytic
metal ion.[14,20] We note that many cyanobacteria
(e.g., Prochlorococcus strains) do
not have Nif11 and/or NHase genes and therefore the evolutionary relationship
between these enzymes and the leader peptides is not clear. In addition
to the N11P and NHLP leader peptides, the LanAs from the planktonic
cyanobacterium Synechocystis sp. PCC
7509 have leader peptides that belong to the TIGR03898 family (Figure 2A).[21] This family mostly
consists of the leader peptides from LanAs in firmicutes (e.g., MrsA
in mersacidin biosynthesis). The five LanAs in Synechocystis sp. PCC 7509 may thus be a result of horizontal gene transfer, possibly
from a firmicute. Furthermore, many of the leader peptides of LanAs
associated with ProcM-like enzymes do not belong to any known protein
family (Figure 2A), illustrating the diverse
origins of lanthipeptide precursor genes that cluster with the lanM genes in the ProcM clade.Another interesting
observation is that some LanAs may have been
truncated at their N-termini. For example, three putative precursor
genes (cyanA1.1–1.3) were found adjacent to
a ProcM-like LanM gene (cyanM1) in the genome of
the cyanobacterium Cyanothece sp. PCC
7425. Compared with CyanA1.2, CyanA1.1 and CyanA1.3 appear to be much
shorter in length (Figure 2B). Analysis of
the 5′-untranslated regions (UTRs) of CyanA1.1 and CyanA1.3
showed that their sequences are very similar to the coding region
of the N-terminus of CyanA1.2 and that the putative truncation in
CyanA1.1 might be caused by a point mutation resulting in a stop codon
in the 5′-UTR (Supporting Information Figure
1). An alternative interpretation is that the longer CyanA1.2
is based on an incorrectly assigned start codon; translation initiation
at a later start codon would result in a peptide that has the same
length as that of CyanA1.1 and CyanA1.3 (Figure 2B). We did observe what appears to be a true N-terminally truncated procA gene (here named procAt.1) in the
genome of Prochlorococcus MIT9313,
which escaped our previous annotation for procA genes.[12] The truncation of ProcAt.1 seems to be caused
by the transformation of the original start codon to an ochre stop
codon, resulting in translation initiation at a different downstream
start codon (Supporting Information Figure 2).
Much of the N-Termini of ProcAs Are Dispensable for ProcM Activity
Unlike the newly identified ProcAt.1, whose leader peptide has
45 amino acids, all of the other ProcA leader peptides have more than
60 amino acids (Supporting Information Figure
3). As mentioned above, ProcA leader peptides belong to the
N11P family[14] and are much longer than
those of other class II lantibiotics (e.g., as a typical example,
the leader peptide of LctA for lacticin 481 biosynthesis has only
23 residues[22,23]). It has been shown that the
lacticin 481 synthetase LctM recognizes both the LctA leader and core
peptides[24] and thus one possible role of
the much longer leader peptides of ProcAs is that they provide additional
recognition elements for ProcM, allowing for the remarkably high substrate
tolerance. In this scenario, N-terminally truncated LanAs are likely
the vestiges of precursor peptide divergence during evolution and
might not necessarily be real substrates. To test whether ProcAt.1
is a ProcM substrate, we coexpressed ProcAt.1 with ProcM in Escherichia coli, and the resulting peptide was digested
by endoprotease Glu-C. High-resolution matrix-assisted laser desorption/ionization
time-of-flight (MALDI-ToF) mass spectrometry (MS) analysis showed
that ProcAt.1 had been dehydrated up to three times (Figure 3 and Supporting Information
Figure 4). N-Ethylmaleimide (NEM) derivatization,
which was employed to derivatize any free cysteine residues, showed
that although lanthionine formation for the 2-fold dehydration product
is incomplete, the 3-fold dehydration product is fully cyclized (Figure 3), strongly suggesting that ProcAt.1 is a true substrate
of ProcM.
Figure 3
Modification of ProcAt.1 by ProcM. (A) MALDI-ToF-MS analysis of
ProcAt.1 that was obtained by coexpression with ProcM and treated
with endoproteinase Glu-C (trace i) and subsequently derivatized by
NEM (trace ii). (B) Sequence of ProcAt.1 modified by ProcM and treated
with Glu-C. The ESI-MS/MS fragmentation pattern for the 3-fold dehydrated
species is shown (the MS/MS data is presented in Supporting Information Figure 4).
Modification of ProcAt.1 by ProcM. (A) MALDI-ToF-MS analysis of
ProcAt.1 that was obtained by coexpression with ProcM and treated
with endoproteinase Glu-C (trace i) and subsequently derivatized by
NEM (trace ii). (B) Sequence of ProcAt.1 modified by ProcM and treated
with Glu-C. The ESI-MS/MS fragmentation pattern for the 3-fold dehydrated
species is shown (the MS/MS data is presented in Supporting Information Figure 4).To further interrogate the importance of the full-length
ProcA
leader peptide, we made a series of ProcA2.8 mutants lacking 30, 40,
and 50 N-terminal amino acids (Figure 4A) and
coexpressed these mutants with ProcM in E. coli. MALDI-ToF-MS analysis clearly showed that mutants lacking the 30
and 40 N-terminal residues were fully dehydrated by ProcM (Figure 4B,C). The peptides were subsequently digested by
endoprotease Asp-N and subjected to NEM derivatization. Compared with
the control peptides that were expressed in the absence of ProcM and
that were fully derivatized by NEM on their two free Cys residues,
the ProcM-modified peptides did not react with NEM (Figure 4D,E), indicating that these peptides were fully
cyclized. MS/MS analysis confirmed that the correct lanthionine rings
of ProcA2.8 were produced in both truncated substrates (Supporting Information Figure 5). Hence, ProcM
is able to perform catalysis not only on substrates with dramatically
varied core sequences but also on mutants that are significantly truncated
in the leader peptide. Although the dispensability of part of the
N-terminal leader peptides has previously been shown in the biosynthesis
of the lantibiotic lacticin 481,[23] the
lasso peptide MccJ25,[25] and the biosynthetic
enzymes of cyanobactin biosynthesis,[26−28] the fact that about
two-thirds of the leader peptide is dispensable for ProcM activity
is surprising considering the high substrate tolerance of the enzyme.
The possibility that the long ProcA leader peptides are correlated
with ProcM’s tolerance of highly varied core sequences is thus
not supported by our studies. Whether ProcM could modify the peptide
lacking the 50 N-terminal residues could not be determined because
this mutant peptide could not be obtained, regardless of whether the
peptide was coexpressed with ProcM or not, suggesting that a certain
minimum length of ProcA2.8 might be necessary for peptide stability,
at least in E. coli.
Figure 4
ProcM modification of
truncated ProcA2.8 derivatives. (A) Sequence
of ProcA2.8 and schematic representation of the truncation variants
discussed in this study. The purple arrow shows the physiological
proteolytic cleavage site for leader peptide removal. The blue arrow
shows the endoprotease Asp-N site that was used in this study to shorten
the peptide and allow better analysis of the post-translational modifications
in the core peptide. (B) MALDI-ToF MS analysis of ProcA2.8-(31–82)
that was obtained either by expressing the peptide alone (trace i)
or by coexpression with ProcM (trace ii). (C) MALDI-ToF MS analysis
of ProcA-(41–82), presented in the same manner as for ProcA2.8-(31–82)
in panel B. (D) ProcA2.8-(31–82) peptides (unmodified and modified)
were digested by Asp-N and subsequently treated with NEM. Trace i
shows the unmodified peptide before NEM derivatization, and trace
ii demonstrates complete NEM derivatization of this peptide. Traces
iii and iv show the ProcM-modified peptide before and after NEM treatment,
respectively. No derivatization of the modified peptides is observed,
strongly suggesting formation of lanthionine rings in the ProcM-modified
peptide. (E) MALDI-ToF MS analysis of Asp-N-digested ProcA2.8-(41–82).
The data are shown as in panel D. In all of the MS spectral data shown,
the signals corresponding to the unmodified and the ProcM-modified
peptides are highlighted in yellow and green, respectively, whereas
the NEM-derivatized peptides are highlighted in light blue. Part of
the nonhighlighted peaks are derived from proteolysis products of
the leader peptide and Asp-N.
ProcM modification of
truncated ProcA2.8 derivatives. (A) Sequence
of ProcA2.8 and schematic representation of the truncation variants
discussed in this study. The purple arrow shows the physiological
proteolytic cleavage site for leader peptide removal. The blue arrow
shows the endoprotease Asp-N site that was used in this study to shorten
the peptide and allow better analysis of the post-translational modifications
in the core peptide. (B) MALDI-ToF MS analysis of ProcA2.8-(31–82)
that was obtained either by expressing the peptide alone (trace i)
or by coexpression with ProcM (trace ii). (C) MALDI-ToF MS analysis
of ProcA-(41–82), presented in the same manner as for ProcA2.8-(31–82)
in panel B. (D) ProcA2.8-(31–82) peptides (unmodified and modified)
were digested by Asp-N and subsequently treated with NEM. Trace i
shows the unmodified peptide before NEM derivatization, and trace
ii demonstrates complete NEM derivatization of thispeptide. Traces
iii and iv show the ProcM-modified peptide before and after NEM treatment,
respectively. No derivatization of the modified peptides is observed,
strongly suggesting formation of lanthionine rings in the ProcM-modified
peptide. (E) MALDI-ToF MS analysis of Asp-N-digested ProcA2.8-(41–82).
The data are shown as in panel D. In all of the MS spectral data shown,
the signals corresponding to the unmodified and the ProcM-modified
peptides are highlighted in yellow and green, respectively, whereas
the NEM-derivatized peptides are highlighted in light blue. Part of
the nonhighlighted peaks are derived from proteolysis products of
the leader peptide and Asp-N.
Lanthipeptides from Cyanothece sp.
PCC 7425
Since, at present, only the prototypical ProcM
has been shown to actually carry out combinatorial biosynthesis, we
decided to investigate for a subset of ProcM-like enzymes whether
they are active with their associated LanA peptides with diverse leader
peptides. We first investigated a particularly interesting lanthipeptide
biosynthetic system in the marine cyanobacteriumCyanothece sp. PCC 7425, a strain that also produces several cyanobactins.[29] The genome of this organism encodes four LanMs
(here termed CyanM1–4) that consist of a separate phylogenetic
subclade (Figures 2A and 5A). Two sets of three lanA genes (cyanA1.1–1.3 and cyanA4.1–4.3) were found adjacent to cyanM1 and cyanM4, respectively, whereas cyanM2 has only a single lanA gene (cyanA2.0) nearby (Figure 5A). Multiple lanA genes (cyanA3.1–3.7) were found
adjacent to cyanM3. In addition, another locus approximately
2.8 Mbp away from cyanM3 encodes five more putative
LanAs (cyanA3.8–3.12) with leader peptides
that are very similar to those of CyanA3.1–3.7 but with lower
sequence homology to the leader peptides of the other CyanAs (Figure 5A and Supporting Information
Table 1). To correlate these precursor peptides with their
corresponding enzymes, a sequence similarity network was constructed
based on their leader sequences. The results show that CyanAs can
be divided into four groups (Figure 5B), suggesting
that they are substrates of four different enzymes.
Figure 5
Lanthipeptide biosynthesis
in Cyanothece sp. PCC 7425. (A) Four
lanthipeptide biosynthetic systems in Cyanothece sp. PCC 7425, showing the gene clusters
of each system and their locations in the genome. (B) Sequence similarity
network based on the leader peptide sequence of CyanAs. Each node
represents a leader peptide, and each edge (line) indicates a pair
of nodes (leader peptides) that have a BlastP e-value more stringent
than the cutoff value used (1 × 10–7). Different
biosynthetic systems are depicted by different colors. (C) High level
of conservation in the N-terminal leader sequence and hypervariability
of the C-terminal core peptide of CyanA3.1–3.12. The GG/GA
protease cleavage site for leader peptide removal is marked by a green
arrow. For the sequences of CyanA1, CyanA2, and CyanA4, see Supporting Information Table 1.
Lanthipeptide biosynthesis
in Cyanothece sp. PCC 7425. (A) Four
lanthipeptide biosynthetic systems in Cyanothece sp. PCC 7425, showing the gene clusters
of each system and their locations in the genome. (B) Sequence similarity
network based on the leader peptide sequence of CyanAs. Each node
represents a leader peptide, and each edge (line) indicates a pair
of nodes (leader peptides) that have a BlastP e-value more stringent
than the cutoff value used (1 × 10–7). Different
biosynthetic systems are depicted by different colors. (C) High level
of conservation in the N-terminal leader sequence and hypervariability
of the C-terminal core peptide of CyanA3.1–3.12. The GG/GA
protease cleavage site for leader peptide removal is marked by a green
arrow. For the sequences of CyanA1, CyanA2, and CyanA4, see Supporting Information Table 1.CyanA1.1, CyanA1.2, and CyanA1.3 were first tested
as potential
substrates for CyanM1 by coexpression of each cyanA1.1–1.3 individually with cyanM1 in E. coli. MALDI-ToF MS analysis showed that all three peptides were modified
by the enzyme (Supporting Information Figures
6–8). The 12 peptides CyanA3.1–3.12 have highly
conserved leader regions but very diverse core sequences (Figure 5C) and thus the CyanM3–CyanA3.x system could
be similar to the combinatorial biosynthetic system of the prochlorocins.
To test this hypothesis, the 12 genes, cyanA3.1–3.12, were each coexpressed individually with cyanM3, and MALDI-ToF MS analysis showed that all peptides were modified
by the enzyme (Supporting Information Figures
9–20), demonstrating very high substrate tolerance for
CyanM3, similar to the observations with ProcM.[12] Some precursor peptides have sequences that allowed cleavage
by commercially available proteases to remove most of the leader peptides
without proteolysis in the core peptide. MALDI-ToF MS analysis of
the proteolytic products clearly demonstrated that the dehydrations
take place in the core peptide (Supporting Information
Figures 16–18).Given that CyanM1–4 form
a distinct phylogenetic subclade,
we reasoned that the three other enzymes might also have high substrate
tolerance. In line with this proposal, we showed that CyanM4 can modify
not only its putative substrate CyanA4.1 (Supporting
Information Figure 21) but also a chimeric peptide consisting
of the CyanA4.1 leader and CyanA1.2 core peptide (Supporting Information Figure 22); the latter is very different
from the CyanA4.1 core peptide. However, CyanM4 did not modify CyanA1.2
(Supporting Information Figure 23), supporting
the model that it is the cognate leader peptide (i.e., the leader
peptide of CyanA4.1) that plays a requisite role for enzyme catalysis,[11] although probably not for substrate promiscuity.
LanAs That Do Not Have a Cysteine Residue
Cysteine
is a required residue for formation of the thioether bridges of lanthipeptides.
However, we noted that several putative LanAs associated with ProcM-like
enzymes only have Ser/Thr residues and lack any Cys residues (Figure 2A and Supporting Information
Table 1). These peptides include ProcA4.1, which is the only
Cys-free peptide among 30 ProcAs (including the newly characterized
ProcAt.1). To investigate whether ProcA4.1 is a ProcM substrate, the
peptide was coexpressed with ProcM, and MALDI-ToF-MS analysis showed
that the resulting peptide was dehydrated despite the lack of Cys
residues (Figure 6A and Supporting Information Figure 24). Since the 29 other ProcA
peptides all contain at least one Cys (Supporting
Information Figure 3), the absence of a Cys in ProcA4.1 likely
resulted from mutations in the core sequence of an ancestor LanA.
Figure 6
Coexpression
studies of Cys-lacking peptides with LanMs. (A) MALDI-ToF
MS analysis of ProcA4.1 that was obtained by coexpression with ProcM.
Also shown is the sequence of the ProcA4.1 core (obtained by TEV cleavage
of a ProcA4.1 mutant containing an engineered TEV cleavage site just
before the predicted core sequence) and the MS/MS fragmentation pattern
for the 3-fold dehydrated species. (B) MALDI-ToF MS analysis of NpnA3
that was obtained by coexpression with NpnM in E. coli. Also shown is the sequence of endoproteinase Glu-C cleaved NpnA3
and the MS/MS fragmentation pattern for the 4-fold dehydrated species.
(C) MALDI-ToF MS analysis of NpnA6 obtained similarly to that for
NpnA3 in panel B. Also presented is the sequence and MS/MS fragmentation
pattern for 3-fold dehydrated NpnA6. The MS/MS data for 3-fold dehydrated
ProcA4.1, 4-fold dehydrated NpnA3, and 3-fold dehydrated NpnA6 are
shown in Supporting Information Figures 24–26, respectively.
Coexpression
studies of Cys-lacking peptides with LanMs. (A) MALDI-ToF
MS analysis of ProcA4.1 that was obtained by coexpression with ProcM.
Also shown is the sequence of the ProcA4.1 core (obtained by TEV cleavage
of a ProcA4.1 mutant containing an engineered TEV cleavage site just
before the predicted core sequence) and the MS/MS fragmentation pattern
for the 3-fold dehydrated species. (B) MALDI-ToF MS analysis of NpnA3
that was obtained by coexpression with NpnM in E. coli. Also shown is the sequence of endoproteinase Glu-C cleaved NpnA3
and the MS/MS fragmentation pattern for the 4-fold dehydrated species.
(C) MALDI-ToF MS analysis of NpnA6 obtained similarly to that for
NpnA3 in panel B. Also presented is the sequence and MS/MS fragmentation
pattern for 3-fold dehydrated NpnA6. The MS/MS data for 3-fold dehydrated
ProcA4.1, 4-fold dehydrated NpnA3, and 3-fold dehydrated NpnA6 are
shown in Supporting Information Figures 24–26, respectively.A more unusual example
is found in N. punctiforme PCC 73102,
which encodes putative RiPP precursor peptides that have
very conserved putative leader peptides but highly diverse core peptides.
Among the six peptides identified, none has a Cys residue (Supporting Information Table 1), but the genes
encoding these peptides do cluster with a ProcM-like gene. Other genes
encoding known RiPP biosynthetic enzymes (e.g., cyclodehydratases
for thiazole/oxazole formation[14,30]) were not found in
the genome of the organism, suggesting the peptides are putative LanM
substrates. Notably, unlike ProcAs and CyanA3.x, whose leader peptides
belong to the N11P protein family, the leader peptides of these putative
RiPP precursors (here termed NpnAs) belong to the NHLP family (Figure 1A). To test whether these NHLP-containing and Cys-lacking
peptides are LanM substrates, two NpnAs with very different core peptides
(NpnA3 and NpnA6) were coexpressed with NpnM. MALDI-ToF MS analysis
of the resulting peptides showed that both NpnAs were dehydrated (Figure 6B,C and Supporting Information
Figures 25–26). Attempts to form a lanthionine by appending
a five-amino acid sequence containing a Cys to NpnA3 were unsuccessful
(Supporting Information Figure 27). Detailed
sequence analysis showed that NpnM may have an impaired cyclase active
site because the enzyme has a QG motif instead of a HG motif found
in almost all LanC and LanM proteins (Supporting
Information Figure 28). The His in this motif is essential
for cyclase activity of the class I lanthipeptide cyclase NisC,[31] and its mutation in NpnM may be a consequence
of its substrates no longer requiring cyclization. These observations
provide support for the model in which lanthipeptide synthetases coevolved
with their substrates during the evolutionary process.[10]The function(s) of the Cys-lacking NpnAs
is also an interesting
question, since the mature products cannot be lanthipeptides. As previously
noted,[32,33] a gene encoding a member of the zinc-dependent
dehydrogenase enzymes (NpnJ) is encoded near the LanM gene. For the
lantibiotics lacticin 3147 and carnolysin, the enzymes LtnJ and CrnJ
convert Dha into d-Ala residues.[33,34] Hence, it is possible that the products of the gene cluster in N. punctiforme PCC 73102 are d-amino acid-containing
peptides, if NpnJ hydrogenates the dehydroamino acids formed by NpnM.
Unfortunately, we did not obtain soluble NpnJ by heterologous expression
in E. coli. Similarly, previous attempts
to complement a ΔltnJ mutant of Lactococcus lactis with npnJ were
unsuccessful, and insoluble expression could not be ruled out.[32]
Conclusions
The tremendous structural
and functional
diversity of lanthipeptides raises many questions regarding their
biological roles, origins, and evolutionary mechanisms.[35−38] The Pcn biosynthetic system is arguably one of the most remarkable
examples of natural product combinatorial biosynthesis found in nature.
By genome mining for putative lanthipeptides whose biosyntheses involve
ProcM-like enzymes, we show that the LanA precursor peptides are highly
diverse among different systems and that phylogenetically closely
related lanthipeptide synthetases can be associated with very different
sets of substrates. In addition, we demonstrated that much of the
N-terminal ProcA leader peptide is not required for enzyme activity.This work also extends the combinatorial biosynthesis paradigm
to many other organisms by experimental demonstration that ProcM-like
enzymes that have multiple nearby precursor peptides have the capacity
to process a widely diverse set of core peptides. Given the sequence
diversity of their leader peptides and the demonstration that much
of the leader peptide is dispensable, this substrate tolerance is
not imparted by the leader peptide and hence it is likely that it
is a property of the ProcM enzyme clade. The findings herein thus
highlight the potential of exploring lanthipeptides by genome mining
and for biosynthetic engineering efforts to produce novel lanthipeptides
with desired biological activities by mimicking the natural evolutionary
process.
Methods
Materials
All oligonucleotides were purchased from
Integrated DNA Technologies. Restriction endonucleases, DNA polymerases,
T4 DNA ligase, and endoproteinase Asp-N and Glu-C were purchased from
New England Biolabs. Media components for bacterial cultures were
purchased from Difco laboratories. Chemicals were purchased from Fisher
Scientific or from Aldrich unless noted otherwise. E. coli DH5α was used as host for cloning and
plasmid propagation, and E. coliBL21
(DE3) was used as a host for coexpression. Cyanothece sp. PCC7425 was purchased from ATCC. Synthetic npnM gene was ordered from GeneArt, and synthetic npnA genes were ordered from Integrated DNA Technologies (IDT).
General
Methods
All polymerase chain reactions (PCR)
were carried out on a C1000 thermal cycler (Bio-Rad). DNA sequencing
was performed by the Biotechnology Center at the University of Illinois
at Urbana–Champaign, using appropriate primers. MALDI-ToF MS
was carried out on Bruker Daltonics Ultraflex MALDI ToF/ToF mass spectrometer.
ESI MS/MS analyses were performed on a Synapt ESI quadrupole ToF mass
spectrometry system (Waters). Deisotoping and deconvolution of ESI
MS/MS spectra were performed using the MaxEnt3 program (Waters). Detailed
procedures for cloning are described in the Supporting
Information. Primer sequences are included in Supporting Information Table 2.
Genome Mining of ProcM-like
Enzymes and Their Associated LanA
Aubstrates
To identify ProcM-like enzymes, BlastP searches
were performed using the ProcM protein sequence as the query. Hits
were selected with identity >30% and gap <8%. To identify putative lanAs, the annotated open reading frames (ORFs) around lanMs were inspected manually, and BlastP searches were
performed within this genome using either a ProcA leader peptide sequence,
the leader sequence from other LanAs identified in the same genome,
or the leader sequences from known LanAs as the queries. Detailed
genome mining procedures are described in the Supporting Information. LanA sequences are shown with their
corresponding LanM enzymes in Supporting Information
Table 1.
Phylogenetic and Network analysis
Bayesian MCMC inference
analyses were performed using the program MrBayes (version 3.2).[39] Final analyses consisted of two sets of eight
chains each (one cold and seven heated), run for about 2 million generations
with trees saved and parameters sampled every 100 generations. A mixed
amino acid model was utilized, and analyses were run to reach a convergence
with standard deviation of split frequencies <0.01. Posterior probabilities
were averaged over the final 75% of trees (25% burn in). Network analysis
was performed by BLAST searches comparing each CyanA leader peptide
against another. A Matlab script was written to remove all duplicate
comparisons, and the result was imported into the Cytoscape software
package.[40] The nodes were arranged using
the yFiles organic layout provided with Cytoscape version 2.8.3.
General Procedure for LanA in Vivo Modification
Electrocompetent E. coliBL21 (DE3)
cells were transformed with the coexpression constructs. Cultures
were inoculated from single colony transformants and grown overnight
at 37 °C in 20 mL of LB broth supplemented with 50 mg L–1 kanamycin. The overnight culture was used to inoculate 1 L of LB
broth, and cells were grown at 37 °C to OD600 ≈
0.6–0.8. Expression was induced by the addition of 0.2 mM IPTG,
and the culture was incubated at 18 °C for 18 h. After harvesting,
the pellet was resuspended in 35 mL of denaturing LanA buffer 1 (6
M guanidine hydrochloride, 20 mM NaH2PO4, 500
mM NaCl, 0.5 mM imidazole, pH 7.5). The cell paste was subjected to
sonication, and the cell debris was removed by centrifugation. The
supernatant was purified by immobilized metal affinity chromatography
(IMAC) using His60 Ni Superflow Resin (Clontech), and the elution
fractions were desalted and purified by reversed-phase HPLC using
a Waters Delta-pak C4 column. The fractions were either analyzed directly
by MALDI-ToF-MS, or treated with appropriate endoprotease (Glu-C,
Asp-N, or TEV protease) to remove the leader peptide before MALDI-ToF-MS
and/or ESI-MS/MS analysis.
Authors: D H Haft; B J Loftus; D L Richardson; F Yang; J A Eisen; I T Paulsen; O White Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971
Authors: Wael E Houssen; Jesko Koehnke; David Zollman; Jeremie Vendome; Andrea Raab; Margaret C M Smith; James H Naismith; Marcel Jaspars Journal: Chembiochem Date: 2012-11-21 Impact factor: 3.164
Authors: Lili Xie; Leah M Miller; Champak Chatterjee; Olga Averin; Neil L Kelleher; Wilfred A van der Donk Journal: Science Date: 2004-01-30 Impact factor: 47.728
Authors: Paul G Arnison; Mervyn J Bibb; Gabriele Bierbaum; Albert A Bowers; Tim S Bugni; Grzegorz Bulaj; Julio A Camarero; Dominic J Campopiano; Gregory L Challis; Jon Clardy; Paul D Cotter; David J Craik; Michael Dawson; Elke Dittmann; Stefano Donadio; Pieter C Dorrestein; Karl-Dieter Entian; Michael A Fischbach; John S Garavelli; Ulf Göransson; Christian W Gruber; Daniel H Haft; Thomas K Hemscheidt; Christian Hertweck; Colin Hill; Alexander R Horswill; Marcel Jaspars; Wendy L Kelly; Judith P Klinman; Oscar P Kuipers; A James Link; Wen Liu; Mohamed A Marahiel; Douglas A Mitchell; Gert N Moll; Bradley S Moore; Rolf Müller; Satish K Nair; Ingolf F Nes; Gillian E Norris; Baldomero M Olivera; Hiroyasu Onaka; Mark L Patchett; Joern Piel; Martin J T Reaney; Sylvie Rebuffat; R Paul Ross; Hans-Georg Sahl; Eric W Schmidt; Michael E Selsted; Konstantin Severinov; Ben Shen; Kaarina Sivonen; Leif Smith; Torsten Stein; Roderich D Süssmuth; John R Tagg; Gong-Li Tang; Andrew W Truman; John C Vederas; Christopher T Walsh; Jonathan D Walton; Silke C Wenzel; Joanne M Willey; Wilfred A van der Donk Journal: Nat Prod Rep Date: 2013-01 Impact factor: 13.423
Authors: Jesko Koehnke; Andrew F Bent; David Zollman; Kieran Smith; Wael E Houssen; Xiaofeng Zhu; Greg Mann; Tomas Lebl; Richard Scharff; Sally Shirran; Catherine H Botting; Marcel Jaspars; Ulrich Schwarz-Linek; James H Naismith Journal: Angew Chem Int Ed Engl Date: 2013-11-08 Impact factor: 15.336
Authors: Andres Cubillos-Ruiz; Jessie W Berta-Thompson; Jamie W Becker; Wilfred A van der Donk; Sallie W Chisholm Journal: Proc Natl Acad Sci U S A Date: 2017-06-19 Impact factor: 11.205
Authors: Qi Zhang; James R Doroghazi; Xiling Zhao; Mark C Walker; Wilfred A van der Donk Journal: Appl Environ Microbiol Date: 2015-04-17 Impact factor: 4.792
Authors: Imran R Rahman; Jeella Z Acedo; Xiaoran Roger Liu; Lingyang Zhu; Justine Arrington; Michael L Gross; Wilfred A van der Donk Journal: ACS Chem Biol Date: 2020-04-28 Impact factor: 5.100