Sebastian L Wenski1,2, Sirinthra Thiengmag1,2, Eric J N Helfrich1,2. 1. Institute for Molecular Bio Science, Goethe University Frankfurt, 60438, Frankfurt am Main, Germany. 2. LOEWE Center for Translational Biodiversity Genomics (TBG), 60325, Frankfurt am Main, Germany.
Abstract
Complex peptide natural products exhibit diverse biological functions and a wide range of physico-chemical properties. As a result, many peptides have entered the clinics for various applications. Two main routes for the biosynthesis of complex peptides have evolved in nature: ribosomally synthesized and post-translationally modified peptide (RiPP) biosynthetic pathways and non-ribosomal peptide synthetases (NRPSs). Insights into both bioorthogonal peptide biosynthetic strategies led to the establishment of universal principles for each of the two routes. These universal rules can be leveraged for the targeted identification of novel peptide biosynthetic blueprints in genome sequences and used for the rational engineering of biosynthetic pathways to produce non-natural peptides. In this review, we contrast the key principles of both biosynthetic routes and compare the different biochemical strategies to install the most frequently encountered peptide modifications. In addition, the influence of the fundamentally different biosynthetic principles on past, current and future engineering approaches is illustrated. Despite the different biosynthetic principles of both peptide biosynthetic routes, the arsenal of characterized peptide modifications encountered in RiPP and NRPS systems is largely overlapping. The continuous expansion of the biocatalytic toolbox of peptide modifying enzymes for both routes paves the way towards the production of complex tailor-made peptides and opens up the possibility to produce NRPS-derived peptides using the ribosomal route and vice versa.
Complex peptide natural products exhibit diverse biological functions and a wide range of physico-chemical properties. As a result, many peptides have entered the clinics for various applications. Two main routes for the biosynthesis of complex peptides have evolved in nature: ribosomally synthesized and post-translationally modified peptide (RiPP) biosynthetic pathways and non-ribosomal peptide synthetases (NRPSs). Insights into both bioorthogonal peptide biosynthetic strategies led to the establishment of universal principles for each of the two routes. These universal rules can be leveraged for the targeted identification of novel peptide biosynthetic blueprints in genome sequences and used for the rational engineering of biosynthetic pathways to produce non-natural peptides. In this review, we contrast the key principles of both biosynthetic routes and compare the different biochemical strategies to install the most frequently encountered peptide modifications. In addition, the influence of the fundamentally different biosynthetic principles on past, current and future engineering approaches is illustrated. Despite the different biosynthetic principles of both peptide biosynthetic routes, the arsenal of characterized peptide modifications encountered in RiPP and NRPS systems is largely overlapping. The continuous expansion of the biocatalytic toolbox of peptide modifying enzymes for both routes paves the way towards the production of complex tailor-made peptides and opens up the possibility to produce NRPS-derived peptides using the ribosomal route and vice versa.
Bioactive peptides can be found across all kingdoms of life [1]. Textbook knowledge separates the biosynthesis of complex peptide natural products (NPs) into ribosomally synthesized peptides and peptides that are produced independently of the classical ribosomal route [[1], [2], [3]]. The ribosomal route can be further subdivided into the evolutionary ancient class of cationic and amphiphilic antimicrobial peptides (AMPs), mainly involved in primary immune defense of plants and animals [3], and ribosomally synthesized and post-translationally modified peptides (RiPPs), which are distributed throughout the tree of life and show a wide diversity of biological functions [1]. AMPs predominantly target cell membranes and are usually active against a broad range of bacteria [4]. RiPPs, on the other hand, exhibit diverse biological functions ranging from antimicrobials (e.g., lantibiotics) [5], co-factors (e.g., PQQ) [6], to hormones (e.g., tri- and tetraiodothyronine) [7]. RiPP-derived peptides are ribosomally synthesized and then undergo an extensive post-translational modification process, before proteolytic cleavage results in the release of the mature peptide NP [3,8]. Moreover, five different ribosome-independent routes for the production of complex peptide NPs have been described [9]. These types differ in their carrier protein or tRNA dependancy, the mode of amide bond formation, and their modular or non-modular protein architecture (for a comprehensive review see Ref. [9]). Among these, the most prominent group of enzymatic machineries for the biosynthesis of complex peptides are modular non-ribosomal peptide synthetases (NRPSs) which act in an assembly line-like fashion [2,9]. Non-ribosomal peptides (NRPs), unlike ribosomally synthesized peptides, are not limited to the 20 proteinogenic amino acids (AAs) but harbor a variety of naturally occurring, non-proteinogenic AAs in their peptide backbones [1,10]. This review focuses on the comparison of multi-modular NRPSs and RiPPs.Extensive bioactivity-guided screening efforts have resulted in the isolation of numerous highly complex peptide NPs since the beginning of the golden age of antibiotics [11]. These efforts have led to the discovery of most NP-derived drugs that we are using today. Examples of complex peptides of medical relevance include the immunosuppressant cyclosporin (1) (NRP) [12], the antibiotic of last resort vancomycin (2) (NRP) [12], the drug candidate for the treatment of cystic fibrosis duramycin (3) (RiPP) [13] and the antibiotic nisin (4) (RiPP) (Fig. 1) [5,14]. Some of these metabolites were isolated more than half a century before the genetic blueprints that govern their biosynthesis were deciphered. As a result of the presence of many non-proteinogenic AAs in these peptides [12], it was speculated that they originate from a ribosome-independent route. It was not until 1988, or 49 years after the initial isolation, that the first genes responsible for the biosynthesis of the NRP gramicidin were identified [[15], [16], [17]]. It took about a decade to decipher the universal principles that govern NRPS biosynthesis [[18], [19], [20]]. Even though the biosynthesis of epidermin, which is structurally related to nisin (4), was elucidated in 1988 and showed that complex peptide NPs can also be biosynthesized via the ribosomal route [21], the historic misconception that complex peptides that harbor non-proteinogenic AAs are likely NRPS-derived, persists up to this day. Recent examples of RiPP-derived peptides that were initially believed to be of NRPS origin include the polytheonamides which are extremely cytotoxic 48-mer pore formers. Polytheonamides biosynthesis involves a total of 49 post-translational modifications resulting in the formation of 28 non-proteinogenic amino acids [22,23]. Moreover, the hexapeptide tryptorubin A which is characterized by a highly rigid three-dimensional (3-D) shape was initially believed to be a NRPS product [24]. Reevaluation of tryptorubin biosynthesis, however, led to the realization that tryptorubin A is the first member of a new RiPP family [25]. This misconception can, at least in part, be attributed to the better understanding of NRP biosynthesis. Insights into NRP biosynthesis have resulted in the development of sophisticated bioinformatic platforms for the identification and annotation of NRPS biosynthetic gene clusters (BGCs) in genome sequences and the structural prediction of the associated peptides. RiPP BGCs that do not belong to the well characterized RiPP families, on the other hand, are significantly more challenging to be identified in genome sequences. As a result, and even though RiPPs have been the fastest growing class of NPs, we are currently not able to chart the full RiPP biosynthetic potential encoded in microbial genome sequences. While only 22 RiPP families had been reported in 2013 [1], we now know more than 40 RiPP families [8].
Fig. 1
Structures of the medically relevant peptides cyclosporin (1) (NRPs), vancomycin (2) (NRPs), duramycin (3) (RiPP), and nisin (4) (RiPP).
Structures of the medically relevant peptides cyclosporin (1) (NRPs), vancomycin (2) (NRPs), duramycin (3) (RiPP), and nisin (4) (RiPP).Since the arsenal of AA modifications encountered in RiPP and NRPS systems is widely overlapping, it becomes increasingly more difficult to assign a peptide to its biosynthetic origin without having access to the genome sequence of the producer. On the other hand, the realization that both RiPPs and NRPSs use different strategies to introduce a largely overlapping arsenal of peptide modifications opens up an entirely new avenue: the production of NRPS-derived peptides using the ribosomal route and vice versa.This review will briefly contrast the key biosynthetic features of NRPS and RiPP biosynthesis, describe the biosynthetic origins of the most common peptide modifications and highlight the different genome mining approaches for the targeted identification of novel RiPP and NRPS-derived peptides. We will then showcase the progress, promise and obstacles in the engineering of both peptide biosynthetic routes and venture a look ahead into the future of engineering peptide biosynthetic pathways.
Biosynthesis of complex peptide natural products
The biosynthetic principles of NRPSs and RiPPs fundamentally differ with respect to recognition, activation and condensation of the AA building blocks as well as the subsequent modification and maturation of the corresponding peptides (Fig. 2). The following section briefly introduces and contrasts the biosynthetic principles of NRPS and RiPPs pathways.
Fig. 2
Schematic overview of the key principles of NRP and RiPP biosynthesis. (A) NRPS and BGC-encoded tailoring enzymes are ribosomally translated. The NRPS assembles the peptide, which is modified on-line (red circle) and released via intramolecular cyclization (purple bond). Posttranslational modifications (brown circle) via BGC-encoded enzymes result in the mature peptide. (B) Typical RiPP BGCs harbor genes encoding a precursor peptide, modifying enzymes and a protease. The ribosomally biosynthesized precursor peptide is post-translationally modified. The leader peptide (grey balls) serves as a recognition sequence for tailoring enzymes that modify the core peptide sequence (colored balls). Once the core peptide is fully modified (purple bond, small red and brown circles) it is released from the leader peptide by a protease. A: Adenylation domain; P: Peptidyl carrier protein; C: Condensation domain; T: Thioesterase domain. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
Schematic overview of the key principles of NRP and RiPP biosynthesis. (A) NRPS and BGC-encoded tailoring enzymes are ribosomally translated. The NRPS assembles the peptide, which is modified on-line (red circle) and released via intramolecular cyclization (purple bond). Posttranslational modifications (brown circle) via BGC-encoded enzymes result in the mature peptide. (B) Typical RiPP BGCs harbor genes encoding a precursor peptide, modifying enzymes and a protease. The ribosomally biosynthesized precursor peptide is post-translationally modified. The leader peptide (grey balls) serves as a recognition sequence for tailoring enzymes that modify the core peptide sequence (colored balls). Once the core peptide is fully modified (purple bond, small red and brown circles) it is released from the leader peptide by a protease. A: Adenylation domain; P: Peptidyl carrier protein; C: Condensation domain; T: Thioesterase domain. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
NRPS-derived peptide biosynthesis
NRPSs can be subdivided into a nonlinear, iterative, or linear type [26]. This review focuses on linear NRPSs, which constitute the most prominent subclass. Modular, non-iterative NPRSs are large mega enzyme complexes that resemble assembly lines. These systems, much like assembly lines in manufacturing processes, can be subdivided into individual modules. Each module incorporates one building block into the nascent oligopeptide chain. The biosynthesis of NRPs is directional and starts with the loading of the first AAs at the N-terminal module and ends with the release of the peptide at the C-terminal module. The primary sequence of the resulting oligopeptide results from the selection of the incorporated AAs by each module. Facultative enzymatic domains in each module modify the AA incorporated by the respective module. This correlation between NRPS architecture and the structure of the associated peptide is referred to as the colinearity rule.A module is composed of an adenylation (A) domain, which selects and activates an AA, a peptidyl carrier protein (PCP/P), responsible for the tethering of the activated AA and the nascent peptide intermediate, and a condensation (C) domain that catalyzes peptide bond formation [27] (Fig. 2). Similar to other adenylating enzymes like acyl-CoA synthetases and firefly luciferases, A domains catalyze two reactions: an initial adenylation of a free AA followed by the transfer of the activated AA onto the phosphopantetheinyl arm of a PCP domain [19,28]. Since a variety of non-proteinogenic AAs can be selectively activated, these domains are the driving force of structural diversity of NRPs [2,10]. Commonly occurring non-proteinogenic AAs in NRPs are frequently hydroxylated, methylated or halogenated. In addition, β- or homo-AAs are often incorporated [2,10]. These unusual building blocks are derived from primary metabolites that are either modified in a NRPS-dependent manner or by other gene cluster-encoded enzymes (Fig. 2). The NRPS-dependent generation of non-proteinogenic AAs is catalyzed by stand-alone NRPSs with the domain architecture A-PCP or A-PCP-X (X: variable modification domain). AAs are activated, loaded onto these monomodular NRPSs, and presented to trans-acting modifying enzymes [2,10]. The generation of hydroxylated AAs illustrates the diversity of building block biosynthesis: AAs can be hydroxylated either in an NRPS-dependent (e.g., echinomycin or kutzneride) or independent (e.g., calcium-dependent antibiotic CDA) fashion [[29], [30], [31]]. The corresponding hydroxylation is performed either via cytochrome P450-monooxygenases (e.g., echinomycin) or α-ketoglutarate-dependent hydroxylases (e.g., kutzneride or CDA) [[29], [30], [31]]. Similarly, halogenation can also occur on free as well as on NRPS-bound AAs [32,33]. Modified AAs can be hydrolytically released and subsequently re-activated by an A domain of the core NRPS or aminoacyl transferases catalyze the direct transfer of the modified AAs from the PCP of the stand-alone NRPS to the corresponding PCP of the core NRPS [2,34] (Fig. 2).A domains harbor a binding pocket with multiple specificity-conferring residues responsible for the selective AA recognition [19]. These conserved residues can be used to predict the A domain's substrate specificity [19,20]. In some cases, A domains are able to recognize multiple, structurally related AAs as the specificity-conferring residues exhibit nearly identical binding affinities for multiple building blocks [2,35]. Consequently, the corresponding NRPS generates a mixture of NRP analogs which, based on the so-called screening hypothesis, was postulated to function as a means of adaptation to changing environmental challenges [[36], [37], [38]]. Consequently, this biosynthetic promiscuity can be regarded as a feature rather than a bug of the system.PCP domains belong to the large family of carrier proteins [39]. Phosphopantetheinyl transferases catalyze the attachment of a phosphopantetheinyl arm to carrier proteins which results in the formation of active holo PCP domains [40]. During NRPS biosynthesis, AA building blocks and peptide intermediates are covalently attached to the phosphopantetheinyl arm of PCP domains via a thioester bond. PCPs are responsible for shuttling activated AAs and the growing peptide chain to the catalytic centers of the module-encoded domains and trans-acting modifying enzymes [40]. Since building blocks and peptidyl intermediates are covalently bound to the PCP domains, the NRPS serves as a template for the biosynthesis of complex peptides and the biosynthetic principle is therefore also referred to as thiotemplated biosynthesis.As soon as the PCPs of two adjacent modules are loaded, the C domain catalyzes the formation of a peptide bond via nucleophilic attack of the C-terminal α-amino residue onto the thioester of the N-terminal intermediate. As a result, the growing oligopeptide is elongated with one building block, and then transferred onto the C-terminal PCP before the elongation process repeats at the next module [41]. In addition to canonical C domains, several homologues have been described which are involved in heterocyclization, epimerization, attachment of carboxylic acids or oligopeptide release ([42]; for a comprehensive review see Ref. [43]).After the final elongation, the oligopeptide is transferred onto a type I thioesterase (TEI) domain via transesterification to form a TE-oligopeptide ester. TEI domains catalyze either a water-mediated hydrolyzation to release a linear product or an intramolecular attack from a hydroxyl or amino group which results in macrolactone or macrolactam formation (Fig. 2) [44,45].Insights into the biosynthesis of NRPS-derived peptides have resulted in the development of several generations of highly sophisticated bioinformatic platforms for the identification and annotation of NRPS BGCs in genome sequences and the prediction of A domain substrate specificities. NRPS BGCs can be identified by hard-coded biosynthetic rules as NRPS core genes are composed of different arrangements of conserved biosynthetic domains [[46], [47], [48]]. This conservation enables domain annotations by profile Hidden Markov Models. The identification of specificity-conferring residues in the substrate binding pocket of A domains, also referred to as the Stachelhaus code, allows the prediction of the incorporated AA building blocks and thus the prediction of the peptide's primary AA sequence [19,20]. Several algorithms for the prediction of A domain substrate specificities have been developed since the initial description of the Stachelhaus code which has resulted in predictions with an ever-increasing accuracy. This increased accuracy can be attributed to the evolution of genome mining algorithms and the steady increase in characterized NRPSs that were used to train novel genome mining algorithms [[46], [47], [48], [49], [50], [51]]. Despite these advancements in A domain substrate specificity predictions, the promiscuity of many A domains often impedes the accurate prediction of the primary AA sequence. Even though the primary peptide sequence of NRPS-derived peptides cannot be predicted with as much confidence as in the case of ribosomally synthesized peptides, NRPSs nevertheless belong to the NP biosynthetic machineries that can be best studied using current genome mining platforms [46,48,50,52]. This advantage over other NP classes such as RiPPs can be attributed to its well-studied assembly line-like character. Conceptually, new NRP scaffolds are the results of novel arrangements of the limited set of NRPS modules with different A domains to form new NRPS architectures. Due to this simple biosynthetic principle, the full biochemical space of canonical NRPSs can be charted by state-of-the-art bioinformatic platforms.
RiPP-derived peptide biosynthesis
Unlike multi-domain mega synthases that catalyze the biosynthesis of complex peptides in an assembly line-like fashion, RiPPs utilize the classical ribosomal route for the production of peptide precursors [53]. The architecture of a typical RiPP BGC consists of a structural gene encoding a precursor peptide, gene(s) involved in peptide modification and a protease (Fig. 2). A ribosomally biosynthesized precursor peptide generally comprises an N-terminal leader peptide, a C-terminal core peptide, and occasionally a C-terminal follower peptide (e.g. bottromycin) [1,8]. In rare cases, N-terminal leader and C-terminal follower peptides can be found within the same precursor peptide (e.g., pantocin) [54,55]. During ribosomal peptide biosynthesis, aminoacyl-tRNA synthetases have a function similar to A domains in NRPSs. Each aminoacyl-tRNA synthetase catalyzes the ATP-dependent activation of its dedicated AA to form an aminoacyl-adenylate [10]. The activated AA is then transferred to its corresponding tRNA in a transesterification reaction, resulting in an aminoacyl-tRNA which is then used in ribosomal peptide biosynthesis [10,56]. However, A domains and aminoacyl-tRNA synthetases share neither sequence nor structural similarities [57]. Following translation by the ribosome, the precursor peptides undergo extensive post-translational modifications (PTMs) catalyzed by an ever-increasing set of modifying enzymes (Fig. 2). The N-terminal leader and C-terminal follower peptides serve as recognition sequences for modifying enzymes. In addition, leader/follower peptides play a role in guiding PTM enzymes to perform modifications in the correct order [1,8]. Once the core peptide is fully modified, it is released from the leader and/or follower peptide by proteolytic cleavage (Fig. 2). Typically, specific proteases are encoded in RiPP BGCs but more and more pathways are reported that utilize ubiquitous cellular proteases for the release of the mature RiPP [1,8,25]. Exceptions from this simple RiPP biosynthetic principle include the cyanobactins, dikaritins and cyclotides BGCs in which multiple core peptides are encoded in one large precursor gene [58]. In these cases, the precursor peptide contains an N-terminal leader peptide followed by multiple core peptides. Highly conserved N- (RSII) and C-terminal (RSIII) recognition sequences flank each core peptide. These recognition sequences are required to guide proteolytic cleavage and macrocyclization. Depending on the RiPP family, these multiple core peptides can either have the same sequence or differ extensively as shown for the highly variable core peptides of the cyanobactins [59].Even though RiPPs follow the seemingly simple and universal biosynthetic principle outlined above, they are a very inhomogeneous subclass of peptide NPs. Thus, each RiPP family bears a unique biosynthetic logic [60,61]. This inhomogeneity is reflected in the wide range of genome mining platforms developed to chart the biosynthetic space of individual families rather than the entire RiPP biosynthetic diversity [60]. While precursor peptides and modifying enzymes show a high degree of homology within each RiPP family, no core genes or domains are conserved between all RiPP families [62,63]. Some RiPP families are characterized by conserved precursor genes, others employ characteristic tailoring enzymes based on which they can be identified in a bait-based approach. These bait-based approaches allow the identification of novel members of characterized RiPP families with high confidence [48,50,52,[64], [65], [66], [67]]. Based on the biosynthetic insights from other members of the same RiPP family, the precursor sequence and the type of post-translational modifications can usually be predicted, yet the number and regioselectivity of these modifications are usually not predictable [68]. To chart the RiPP biosynthetic potential beyond RiPP family boundaries, classical homology-based approaches have been complemented with several machine learning-based algorithms for the targeted identification of novel RiPP families [[69], [70], [71], [72]]. These algorithms target putative precursor genes, recognition elements that are conserved amongst several RiPP families and tailoring enzymes with homology to modifying enzymes of characterized RiPPs [[69], [70], [71], [72]]. While these highly sophisticated approaches will likely expand RiPP biosynthetic space significantly, there is currently no individual tool available which is capable of charting the full biosynthetic space of RiPP-derived NPs [68,73].
Peptide modifications
In comparison to RiPPs, NRPS-derived peptides are modified at different biosynthetic stages. The chemical diversity of NRPs is mainly based on adenylation domain substrate specificities, module order, composition and the presence of trans-acting enzymes. In rare cases, NRPs are modified after the oligopeptide has been released. In contrast, only proteinogenic AAs are incorporated into RiPP-derived peptides. After translation, the precursor peptide is modified, leading to the conversion of proteinogenic AAs into a wide range of non-proteinogenic AAs. In addition, some RiPP families are characterized by complex 3-D structures. In the following paragraphs differences and similarities of modifications in NRPS and RiPP derived peptides are discussed (Fig. 3).
Fig. 3
The most frequently encountered modifications in NRPs and RiPPs. The color code indicates whether a modification has been described for NRPS-derived peptides (blue), RiPPs (green) or was reported for both RiPP and NRPS routes (yellow). Modifications labeled with an asterisk (*) can occur at different positions and atoms. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
The most frequently encountered modifications in NRPs and RiPPs. The color code indicates whether a modification has been described for NRPS-derived peptides (blue), RiPPs (green) or was reported for both RiPP and NRPS routes (yellow). Modifications labeled with an asterisk (*) can occur at different positions and atoms. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
Epimerization
The incorporation of D-configured AAs influences the overall structure and bioactivity of peptides and prevents degradation by proteases [74]. NRPSs introduce ᴅ-AAs via epimerization (E), dual condensation/epimerization (CE) and A domains (Fig. 3) [42,[75], [76], [77]]. The epimerization via E and CE domains takes place after the incorporation of an ʟ-AA. The position of the module-encoded E or CE domains in the NRPS assembly line determines which AA is epimerized. The catalytic mechanism of E domains is proposed to comprise α-proton abstraction via the catalytic glutamate of the E domain followed by racemization [42,78]. Interestingly, in CE domains the catalytic glutamate is missing, suggesting an alternative mode of epimerization [2,42,78]. Epimerization domains are followed by a distinct C domain subtype, the DCL type [42]. The stereoselectivity of the DCL domain enables the exclusive incorporation of the d-isomer from the racemic mixture generated by the E domain [79]. In rare cases, A domains were shown to directly activate ᴅ-AAs that are generated via cytoplasmic racemaces [[75], [76], [77]].In contrast to NRPs, direct incorporation of ᴅ-AAs into RiPP scaffolds does not occur in nature since ribosomal peptides are restricted to the 19 proteinogenic L-configured AAs and glycine. Therefore, incorporation of ᴅ-AAs requires post-translational epimerization by PTM enzymes. In RiPPs, epimerization can occur via a direct or an indirect mechanism [8]. Direct epimerization of ʟ-AAs to ᴅ-AAs has been shown to follow a radical mechanism catalyzed by radical S-adenosylmethionine (rSAM) enzymes. PoyD is the first characterized rSAM epimerase that performs a total of 18 epimerizations in an alternating fashion during polytheonnamide biosynthesis in a C- to N-directional manner [22,23,80]. AA epimerization by PoyD-like enzymes is initiated by reductive cleavage of SAM to generate a 5′-deoxyadenosyl radical (5′-dA•) which abstracts the Cα H-atom from ʟ-AAs to form a carbon-centered radical. The thiolate proton of cysteine in the epimerase is then transferred to the radical intermediate leading to the formation of ᴅ-AAs in a radical rebound mechanism [81]. In contrast to this radical mechanism, the indirect epimerization mechanism requires two enzymatic steps that have been reported for some lanthipeptides. Epimerization is initiated by dehydration of l-serine to form dehydroalanine (Dha) by the dehydratase LanB. Once Dha is formed, the dehydrogenase LanJ catalyzes the diastereoselective hydrogenation to yield d-alanine [82,83]. Moreover, BotH-like enzymes, belonging to the subfamily of α/β hydrolase (ABH) fold proteins, have been shown to be involved in yet another epimerization route that converts l-aspartate to d-aspartate during bottromycin biosynthesis. The proposed mechanism involves the self-abstraction of an α-proton by the carboxylic acid group of the AA side chain followed by proton transfer from a water molecule to yield the epimerized AA [84].
Heterocyclization
Thiazolines and (methyl) oxazolines are characteristic for multiple NRP and RiPP families (Fig. 3). During NRPs biosynthesis, cyclization (Cy) domains, homologues of C domains, catalyze heterocycle formation in a bifunctional manner: First, the NRP intermediate is elongated by the Cy domain either with cysteine, threonine or serine. Subsequently, Cy domains catalyze the nucleophilic attack of the thiol or hydroxy group of the AA side chain onto the carbonyl carbon of the amide bond. Subsequent dehydration results in (thi/ox-)azoline formation [85]. Thiazoline- or oxazoline heterocycles can either be reduced via trans-acting reductases to form thiazolidines/oxazolidines or oxidized via cis or trans-acting oxygenases to form thiazols/oxazols [[86], [87], [88], [89], [90]]. Cysteine-, threonine- or serine-derived heterocycles have also been reported for many RiPPs including linear azol(in)e-containing peptides (e.g., microcin B17) [91], bottromycin [92], and cyanobactins (e.g., patellamide A) [93]. The mechanism underlying thiazole and oxazole formation requires an enzyme complex that is comprised of one or two cyclodehydratase(s) (C/D protein) and a dehydrogenase (B protein). First, cysteine, threonine or serine undergo ATP-dependent cyclodehydration catalyzed by the cyclodehydratase to form an azoline heterocycle. Then, a flavin-dependent dehydrogenase oxidizes the azoline rings to yield azole heterocycles. In some cases, including cyanobactins (e.g., trunkamide) and half of known linear azol(in)e-containing peptides (LAPs), C- and D-proteins are fused into one single enzyme [8,94,95].
Methylation
Peptide methylation is a simple way to influence polarity, improve proteolytic resistance, facilitate cellular uptake and increase the half-life of a peptide, all of which are important parameters for bioavailability [96]. In NRPSs, the majority of methylation (M) domains act in cis and are directly integrated into a flexible loop of the A domain. M domains catalyze mainly N- but also O-, C- or S-methylations in a SAM-dependent manner [2,[97], [98], [99]] (Fig. 3). Remarkably, during cyclosporin biosynthesis, methylation is critical for the overall peptide assembly and cyclization [100]. In rare cases, methylation takes place after peptide release via tailoring enzymes like the stand-alone methyltransferase MtfA in glycopeptide biosynthesis [101].Polytheonamide, one of the most densely modified RiPPs, has been shown to carry eight N-methylations and seventeen C-methylations. A rSAM-dependent methyltransferase, PoyE, is responsible for all N-methylations. While the two vitamin B12-dependent C-methyltransferases, PoyB/C catalyzes all C-methylations [22,23]. PoyC has been shown to catalyzes the homolytic cleavage of SAM to generate 5′-dA• that abstracts the Cβ H-atom of l-valine residues. The carbon-centered valine radical likely reacts with the methyl radical to yield Cβ methyl-valine [80]. Computational studies of polytheonamide B revealed that N-methylation stabilizes the unusual β-helix conformation [102]. Beside N- and C-methylation, a rare S-methylation of cysteine residues mediated by a SAM-dependent methyltransferase takes place during the biosynthesis of a cryptic proteusin in the sponge symbiont "Candidatus Entotheonella factor" (Fig. 3) [103]. Moreover, a rSAM-dependent methyltransferase that methylates carboxyl groups has been reported during the final step of bottromycin biosynthesis resulting in methyl ester formation at the aspartate residue [92].
Macrocyclization
Macrocyclization is a common feature found in peptide NPs, derived from both NRPS and RiPP pathways [104]. Macrocyclization plays a crucial role for the biological activity of many compounds and enhances peptide stability by protecting from proteolytic digestion [53,105]. Furthermore, during NRP biosynthesis, head-to-tail or side-chain-to-tail macrocyclizations are essential for peptide release. Here, TEI domains or terminal C domains catalyze the intramolecular nucleophilic attack on the (thio-)ester which leads to macrolactam or macrolactone formation (Fig. 3) [2,45,106,107]. In rare cases, including gramicidin biosynthesis the peptide dimerizes upon release [108]. Furthermore, reductase (R) domains indirectly catalyze macrocyclizations via a reductive release mechanism [109]. R domains reduce the thioester under NAD(P)H consumption leading to the release of reactive aldehydes [110,111]. The resulting peptide aldehydes undergo further modifications like spontaneous intramolecular cyclizations to form an imine/carbinolamine [109].Diverse macrocyclizations have been reported from RiPP systems ranging from head-to-tail amide formation (e.g., cyanobactin), sulfide bond formation (e.g., glycocin), the formation of lactone/lactam rings between AA side chains (e.g., microviridins) as well as macrocyclization via C–C and C–S bond formation [112] (Fig. 3). Head-to-tail macrocyclization is a common PTM found in RiPPs [105]. Several enzymes involved in macrocyclization have been experimentally characterized including the PatG protease that is involved in patellamide biosynthesis [8]. PatG contains a subtilisin-like serine protease domain which recognizes and cleaves a signal sequence on the precursor peptide. This cleavage results in the formation of an acyl-enzyme intermediate, where a serine residue of the catalytic triad of the protease is bound to the core peptide. In the next step, the acyl-enzyme intermediate is attacked by the N-terminal amino group of the peptide to form a macrolactam. This mechanism is similar to macrocyclizations catalyzed by TE domains in NRPSs that contain the same catalytic triad as the serine protease PatG and that likewise mediate macrocyclization via an acyl-enzyme intermediate [104]. Macrocyclization via Michael-type addition is an alternative mechanism to construct cyclic structures in RiPPs as shown for the formation of lanthionine bridges during lanthipeptide biosynthesis [112]. A lanthionine (Lan) bridge is a thioether crosslink between the β-carbons of serine/threonine and cysteine [83]. In nisin biosynthesis, for instance, it has been shown that Lan and (methyl)Lan formation is a two-step process. Lan formation is initiated with the conversion of serine and threonine to Dha and dehydrobutyrine (Dhb), respectively, by a dehydratase (NisB). Then, a cyclase (NisC) catalyzes the 1,4-nucleophilic attack of the cysteine thiol to the β-carbon of the dehydro AAs which results in a thioether enolate [83]. The enolate can either be protonated to form a lanthionine bridge or attack another Dha to generate a labionin crosslink [83]. While the lanthionine motif is unique for RiPP-derived peptides and has not been described for NRPs, Dha has also been proposed to be an intermediate of the NRPS-derived pyrrolizidine biosynthesis. During pyrrolizidine biosynthesis, the exomethylene side chain of Dha putatively acts as a nucleophile and attacks a carbonyl carbon resulting in cyclization by carbon-carbon bond formation [113,114].Moreover, sactionine linkages, the name defining modifications of the sactipeptides, are formed via a radical mechanism that has been characterized for subtilosin biosynthesis. The rSAM enzyme AlbA catalyzes the formation of the uniquely defined sactionine thioether cross-link between the thiol residue of cysteine and the α-carbon of phenylalanine or threonine. The reaction is initiated by the cleavage of SAM to form 5′-dA• which then abstracts a proton of the α-carbon of phenylalanine or threonine [115]. The carbon-centered radical then attacks the sulfur atom to form the thioether bond [112].
β-amino acids
The incorporation of β-AAs leads to structural diversity and increases proteolytic resistance of peptides [116]. Many NRPs are composed of one or multiple β-AAs like β-tyrosine in chondramide [117], β-lysine and 2,3-diaminopropionate in viomycin [118] or β-alanine in cryptophycins ([119], for a comprehensive review see Ref. [116]). In NRPS systems, A domains recognize and subsequently activate free β-AAs as unusual building blocks, derived from a variety of different pathways [116]. In contrast, during RiPP biosynthesis β-AAs cannot be directly incorporated but proteinogenic AAs can be converted into β-AAs [120]. Piel, Morinaka and co-workers have recently identified a rSAM-dependent mode to convert proteinogenic AAs into α-keto-β-AAs. The reaction involves the unusual radical excision of thyramine and the rejoining of the remaining peptide fragments to create an α-keto-β-AA [120]. Moreover, the installation of a β-AA has been described for the lanthipeptide OlvA(BCSA). In the case of OlvA(BCSA), a SAM-dependent O-methyltransferase catalyzes the conversion of aspartate to the β-AA l-isoaspartate through methylation, followed by imide formation between the carboxyl group of the AA side chain and the amide bond of the peptide backbone. Subsequent imide hydrolysis can result in the transfer of the peptide backbone to the former aspartate side chain, resulting in β-AA formation [121].
Carbon-carbon bond formation
The introduction of carbon-carbon bonds strongly influences the structure and bioactivity of NPs. The bioactivity of vancomycin-like NRPs, for instance, depends on the formation of the rigid aglycon structure that is crucial for target binding [[122], [123], [124]]. During vancomycin biosynthesis, the peptide scaffold is cyclized by the cytochrome P450 oxygenases OxyA, OxyB, and OxyC through the introduction of two diaryl ether bridges and a biaryl bond (Fig. 3) [122]. Remarkably, the trans-acting oxygenases are recruited by a non-catalytic C domain-like domain, called X domain, which is embedded in the NRPS assembly line [125]. However, inactivation of the oxygenases has no influence on the biosynthesis of the peptide backbone [126].Carbon-carbon bond formation in RiPPs is frequently catalyzed by rSAM enzymes (Fig. 3) [127]. The rSAM catalyzed carbon-carbon bond formation between non activated carbons of tryptophan and the β-carbon of lysine, for instance, has been characterized for darobactin biosynthesis [128]. Only recently, a second mode of carbon-carbon bond formation has been proposed. Tryptorubin A, a hexapeptide that is characterized by an unusual complex 3-D shape features a carbon-carbon and two carbon-nitrogen bonds between two AA side chains and between one AA side chain and the peptide backbone, respectively, that are putatively installed by a single cytochrome P450 in an atropospecific fashion [25]. Shortly after the putative tryptorubin BGC was identified, similar BGC architectures were reported. The bicyclic tetrapeptide, cittilin contains biaryl and aryl-oxygen-aryl ether bonds that are installed by the cytochrome P450 enzyme, CitB [129]. Similarly, the cytochrome P450 monooxygenase, BytO is responsible for the installation of an unusual biaryl bridge between tyrosine and histidine in a short pentapeptide precursor during the biosynthesis of the biarylitide tripeptide [130].
Lipidation/prenylation
Peptide lipidation increases stability, decreases polarity and can enable membrane interactions [131]. During NRPS biosynthesis, N-terminal C domains, so-called CStarter domains, catalyze the condensation of carboxylic acids, in the form of fatty acyl-coenzyme A esters, with the α-amino group of the first AA (Fig. 3). This N-acylation of NRPs is a common feature, connecting NRPSs with fatty acid metabolism [42,132]. In addition, diketopiperazines (e.g., cyclomarazine, echinulin, notoamide) but also linear peptides (e.g., cyclomarins) are frequently lipidated via prenylation with isoprene moieties [[133], [134], [135]] (Fig. 3). In contrast to N-acylation, prenyltransferases install prenyl groups at carbon or oxygen atoms after peptide release [135,136].Lipidation in RiPPs is rare. Piel and co-workers have recently reported a new family of RiPP-derived lipopeptides, the selidamides [137]. Kamptornamide, phaeornamide and nostolysamides are the first members of the selidamide family of RiPP-derived lipopeptides that is characterized by fatty acyl moieties attached to the side chain of (hydroxy)ornithine or lysine, respectively. Heterologous expression studies suggest that every peptide of the selidamide family is selectively modified by a fatty acid with fixed chain length by members of GCN5-related N-acetyltransferase (GNAT)-like family of peptide maturases [137]. In the case of the lipolanthine family of RiPPs (e.g., microvionin and goadvionin), fatty acids are attached to the N-terminal amino group of the peptide backbone by members of the GNAT family [[138], [139], [140]]. Since lipolanthines are biosynthesized by hybrid polyketide (PK)-RiPP pathways, they will be discussed in the crosstalk section below.Similarly, N-terminal acetylation of RiPPs has been reported for lasso peptides (e.g., albusnodin), the LAP (e.g., goadsporin) and microviridins, which is likewise catalyzed by members of the GNAT superfamily [[140], [141], [142]]. Prenylation is a common modification in cyanobactin biosynthesis (Fig. 3). The first characterized prenyltransferase, LynF, has been shown to catalyze the head or tail installation of isoprene units onto serine, threonine and tyrosine [143]. However, after enzymatic O-prenylation via the isoprene's C3 carbon (head), a non-enzymatic Claisen rearrangement results in the transfer of the prenyl group to the ortho position while simultaneously inverting the prenyl linkage from the C3 (head) to the C1 (tail) carbon of the isoprene unit [143,144]. Apart from cyanobactins, the quorum-sensing compound ComX is also prenylated. The prenyltransferase ComQ catalyzes the prenylation at the indolic C3 position of tryptophan in the core peptide with geranyl diphosphate (C10) or farnesyl diphosphate (C15) [145,146].
Further modifications
NRPs and RiPPs are further modified via the addition, subtraction or rearrangement of chemical residues, resulting in peptides with various physico-chemical properties and bioactivities [2,147].Halogenated compounds are widespread in nature and are described for both NRPs and RiPPs (Fig. 3) [148]. Halogenated NRPs can either be generated via incorporation of free halogenated AAs [32] or via halogenation of the enzyme-bound oligopeptide through trans-acting halogenases ([149], for a comprehensive review see Ref. [147])). Halogenations are catalyzed via FADH2-dependent (e.g. kutzneride) [32] or non-heme-FeII α-ketoglutarate-dependent halogenases (e.g. syringomycin E) [33]. Similar to NRPSs, FADH2-dependent halogenases in RiPPs have been shown to catalyze tryptophan chlorination and bromination during the biosynthesis of the lanthipeptide NAI-107 and a sponge-derived proteusin, respectively [150,151].Vancomycin and teicoplanin are glycosylated NRPs that belong to the family of glycopeptide antibiotics [152]. After the peptide is released from the NRPS, the glycosyltransferases tGtfA and tGtfB catalyzes the glycosylation in a fixed order (Fig. 3) [153]. Glycosylation is described for the RiPP families glycocins [[154], [155], [156]], thiopeptides [157], lanthipeptides [158] and lasso peptides (Fig. 3) [159]. The lasso peptide pseudomycoidin, for instance, is glycosylated at a phosphorylated serine residue. Here, the nucleotidyltransferase PsmN is proposed to catalyze the installation of mono and/or dihexose residues onto the phosphorylated pseudomycoidin [159].Epoxidations are present in NRPs as well as in RiPPs. During the biosynthesis of the NRP cyclomarin, an epoxide moiety is installed by a cytochrome P450 enzyme after the prenylation of phenylalanine [133]. Similarly, cytochrome P450 enzymes catalyze epoxidations as well as hydroxylations of RiPPs as shown during the biosynthesis of thiostrepton and thiopeptide GE2270 [160,161].In rare cases (e.g. gramicidin or szentiamide), NRPs are formylated at the N-terminus [[162], [163], [164]]. In contrast to the fatty acid attachment mediated by CStarter domains, the formylation reaction of the first AA is catalyzed via formyl transferases under consumption of formyltetrahydrofolate [42,[162], [163], [164]]. To the best of our knowledge, no formylated RiPPs have been reported to date.In peptide NPs, peptide bonds can be transformed into thioamides. The YcaO homolog TvaH has been proposed to catalyze thioamidation during the biosynthesis of thioviridamide, a member of the thioamitide RiPP family [165].In addition to the frequently encountered peptide modifications, different modes for the formation of non-proteinogenic AAs have been described. In NRPS systems, non-proteinogenic AAs are directly incorporated, whereas proteinogenic AAs have to be transformed into non-proteinogenic AAs after incorporation into the precursor peptide in RiPPs. The conversion of arginine into the non-proteinogenic AA ornithine by an arginase during landornamide A biosynthesis displays an example for this biosynthetic strategy [166].
Crosstalk to other NP pathways
The diversity of NRPS- and RiPP-derived peptides is not restricted to the incorporation of different AA building blocks or the ever-increasing number of characterized tailoring reactions and enzymatic domains. In addition, NRPSs as well as RiPPs interact with other pathways to generate hybrid NPs. This hybridization takes place in a precursor-directed manner via the integration of intermediates or products from other pathways. Furthermore, the biosynthetic core enzymes directly interact either in a covalent or non-covalent manner. This chapter briefly describes different types of hybrids with involvement of NPRSs or RiPPs.
NRPS
NRPSs are predestined for hybridization with other pathways as A domains are able to recognize a wide variety of substrates beyond the 20 proteinogenic AAs. Moreover, NRPs are biosynthesized in a thiotemplated, assembly line-like fashion, a common mode of biosynthesis that NRPSs share with polyketide synthases (PKSs) and fatty acid synthases (FASs).Since the boundary between the incorporation of unusual building blocks and the interaction with further metabolic pathways is fluid, we consider any exchange between metabolic pathways as crosstalk (Fig. 4). The non-proteinogenic AA dihydroxyphenylglycine, for instance, is generated by the type III PKS DpgA and is subsequently incorporated into the growing balhimycin NRP, a vancomycin-type antibiotic [167,168]. Similarly, during cyclosporin biosynthesis, a PKS is involved in the formation of the non-proteinogenic AA 4-(2-butenyl)-4-methyl-threonine [169]. In addition, unusual building blocks, derived from primary metabolic pathways like the citrate cycle, are often incorporated after further modifications ([170], for a comprehensive review see [2]). During the biosynthesis of CDA, for instance, α-ketoglutarate is methylated and subsequently transaminated to generate 3-methyl glutamate [170].
Fig. 4
Crosstalk of NRPSs with other pathways. (A) Precursor-based crosstalk involves the direct incorporation of building blocks that are biosynthesized by other pathways. (B) Core enzyme-based crosstalk can be subdivided into covalent (1, 2, 3) and non-covalent interactions (4, 5, 6). KS: Ketosynthase; AT: Acyltransferase; ACP: Acyl carrier protein; C: Condensation domain; A: Adenylation domain; P: Peptidyl carrier protein.
Crosstalk of NRPSs with other pathways. (A) Precursor-based crosstalk involves the direct incorporation of building blocks that are biosynthesized by other pathways. (B) Core enzyme-based crosstalk can be subdivided into covalent (1, 2, 3) and non-covalent interactions (4, 5, 6). KS: Ketosynthase; AT: Acyltransferase; ACP: Acyl carrier protein; C: Condensation domain; A: Adenylation domain; P: Peptidyl carrier protein.Furthermore, PCP domains, which belong to the carrier protein family, play a central role during hybridization of NRPSs with other pathways (Fig. 4). Acyl carrier proteins (ACPs) belong to the same protein family and are involved in the biosynthesis of PKs and fatty acids [39]. Even though PCPs and ACPs do not share high sequence similarities, their protein structure is conserved and enables highly specific interactions with many different enzymes [18,39]. Therefore, it is not surprising that NRPSs often hybridize with other thiotemplated enzymes like FASs or PKSs. In fungi and bacteria, a number of hybrid mega-synthases have been reported that are derived from the fusion of NRPSs and PKSs [[171], [172], [173]]. Here, the carrier protein is the connecting component between the different pathways [172]. Fungal hybrids are frequently composed of N-terminal PKS- and C-terminal NRPS-modules (PKS-NRPS-hybrids) and use an ACP as fusion point (e.g. fusarin C or equisetin A) (Fig. 4) [172,174,175]. In bacterial mega-synthases, responsible for the generation of barnesin A or glidobactin, the organization is inverted (NRPS-PKS-hybrids) with the PCP serving as connecting domain [173,176] (Fig. 4). Additional NRPS-PKS hybrid organizations have been reported including PKSs that harbor multiple isolated NRPS modules and vice versa. Moreover, NRPSs transiently interact with PKS as well as FAS via their ACPs or PCPs (Fig. 4) [177,178]. However, carrier proteins are not essential for the connection of NRPS and PKS assembly lines. The free-standing C domains in the biosynthesis of fabclavines and zeamines, for instance, mediate the crosstalk via the condensation of an PKS-derived polyamine with an NRPS-bound peptide (Fig. 4) [179,180]. Remarkably, the biosynthesis of pyonitrin involves the non-enzymatic condensation of the NRPS-derived aeruginaldehyde and aminopyrrolnitrin [181].
RiPPs
In contrast to NRPSs, RiPP hybrids are rare. This infrequent occurrence might be due to the unique RiPP biosynthetic principles that make it inherently more difficult to form hybrids. Alternatively, it might be a result from a characterization bias meaning that hybrid BGCs might have been bioinformatically identified but not experimentally validated. Microvionin is the first member of the lipolanthine RiPP family of lanthipeptide - fatty acid hybrids. It is composed of a triamino-dicarboxylic acid moiety, the so-called avionin residues (Fig. 3), and a bismethylated guanidino fatty acid (MGFA) at the N-terminus. The microvionin BGC harbors, a total of ten genes in addition to the those that encode enzymes responsible for lanthipeptide biosynthesis. As the corresponding proteins are homologues of FAS- or type II PKS-derived enzymes, a FAS/PKS-RiPP hybrid biosynthesis was postulated [138]. In addition, genome mining revealed more than 80 lipolanthine BGCs from actinobacterial genomes, leading to the classification of four lipolanthine subtypes. These BGC subtypes are distinguished by the presence of type I or type II PKS-related genes and lanD, which encodes a cysteine decarboxylase. Interestingly, the subtype I as well as an additional unique cluster harbor genes coding for NRPS-derived domains, indicating a putative PKS-NRPS-RiPP hybrid NP [139]. Mining the genome of Streptomyces sp. TP-A0584 resulted in the identification of yet another member of the lipolanthine family of PK/RiPP hybrids [182]. Goadvionin is an octapeptide that contains an avionin moiety and an N-terminally attached C32 acyl residue. The corresponding gdv BGC comprises a total of 22 genes that include genes involved in lanthipeptide biosynthesis, precursor supply, tailoring reactions, as well as type-I PKS, FAS, regulatory and transporter genes. Goadvionin is the first functionally characterized member of the lipolanthine family. A combination of heterologous expression, gene inactivation, in vivo and in vitro studies showed that lysine first undergoes methylation, followed by extension by the fatty acid biosynthetic enzymes encoded in the pathway. The fatty acid is subsequently transferred to and modified by the PKS component before it is loaded onto the avionin-containing peptide. In analogy to the selidamides, the PK-RiPP hybridization involves a member of the GNAT superfamily which catalyzes the condensation reaction between PK and RiPP moieties [137,182].
Engineering
In addition to identifying novel BGCs and characterizing the corresponding products, engineering of NRPS and RiPP pathways is an important means to increase structural diversity. Furthermore, dissection and engineering of individual pathway components opens up the possibility to gain deeper biosynthetic insights. Engineering NRPS and RiPP pathways require different strategies and techniques due to their fundamentally different biosynthesis principles. While NRPS engineering mainly focuses on manipulating the AA sequence of NRPs, RiPP engineering targets the variety of modification processes and their applicability to different core peptides.The collinearity between the NRPS architecture and its product is predestined for engineering. The first successful attempts to manipulate the primary AA sequence of NRPs were described in 1995 [183]. At the time, the Marahiel lab was able to swap A-P didomains in the surfactin NRPS to obtain non-natural NRPs [183]. In the following years, many comparable approaches that relied on multiple domain exchanges were published (for a comprehensive review see Ref. [2]) (Fig. 5). In many cases, however, domain exchanges resulted in a decrease in peptide titers, while the diversity of the resulting NRPs remained limited. Since the identification of the A domain binding pocket and the subsequent deciphering of the Stachelhaus code, the A domain moved into the focus of engineering efforts [19,20]. Single or multiple AAs in the A domain's binding pocket were exchanged to alter its substrate specificity without drastic decreases in peptide titers (Fig. 5). However, using such minimal invasive approaches, the A domain selectivity only marginally changed regarding structure or polarity of the AAs [184]. Following studies reported the incorporation of AA analogs like 3-methyl glutamine or O-propargyl-l-tyrosine using site-directed mutagenesis [185,186]. Furthermore, using a directed evolution approach, the specificity of an A domain was successfully changed from l-phenylalanine to l-alanine. During this approach, the non-conserved, specificity-conferring binding pocket residues of a promiscuous A domain were engineered via successive saturation mutagenesis [187]. Although many examples of successful NRPS engineering have been reported, universal principles were still lacking. Until 2018, a traditional NRPS module was defined as C-A-P(-E) (Fig. 5) [107,163,188]. With the introduction of the so-called exchange units (XUs) module boundaries were redefined [163]. Bode and co-workers used a conserved motif in the C-A linker as recombination site for their engineering efforts, thereby defining A-P-C(E) as a XU (Fig. 5) [163]. A shift by one domain to redefine module boundaries was also postulated for modular PKSs [[189], [190], [191], [192]]. Using XUs from Gram-positive and Gram-negative bacteria, functional NRPSs were generated that produced both known and artificial NRPs with only moderately reduced titers [163]. Since the XU concept was proposed to be limited by the downstream C domain specificity, the Bode lab subsequently refined the engineering strategy and developed the exchange unit condensation domain (XUC) concept [163,193]. Here, the fusion point is located in a flexible loop within the C domain. Thus, the C domain is divided into its N- and C-terminal subdomain, leading to the module architecture Csub-A-P-Csub for XUCs (Fig. 5). The advantage of this division is that the C domain maintains the original interface with the adjacent A domain [193]. Disadvantages are that two XUCs, one containing a C and the other containing a CE domain, cannot be combined. Furthermore, XUCs from different genera are not compatible [193]. Both limitations were attributed to structural differences of the C(E) domains, that prevent functional fusions of the N- and C-terminal subdomains [193]. However, the XU and the XUC concept can be combined to overcome their respective limitations. Functional NRPSs were generated using the different recombination sites of each concept [163,193]. Remarkably, both engineering strategies as well as multiple others strictly respect C domain specificities [163,[193], [194], [195]]. However, recent studies showed that C domains do not harbor binding pocket-like structures. Therefore, it was speculated that C domains do not exhibit a specificity [[196], [197], [198]]. Furthermore, based on a series of in vitro and in vivo experiments, the C-A interface was identified as a specificity-conferring structure with an “extended gatekeeping” function, which influences the A domain specificity [198]. Evolutionary analysis also revealed that NRPS diversity is based on recombination events restricted to the A(core) domain independent of the C domain [196,199,200]. Consequently, A(sub) domain exchanges were reevaluated for NRPS engineering. Using this A(sub) domain exchange strategy, multiple NRPSs were successfully engineered, yet no universal rules have been established to date [196]. Prior to the new focus on A domain engineering, comparable results had been obtained in previous studies using A(sub) domain swaps (Fig. 5) [35,201,202]. Taken together, these results indicate that domain interfaces, in particular of the C-A didomain, are more crucial for a functional module than previously expected. For a better understanding, more structural data is required to compare interfaces of NRPSs with low homology and to identify crucial structural motifs.
Fig. 5
NRPS module definitions and engineering strategies. (A) Classical and XU/XUC-based definition of module boundaries. (B) NRPS engineering strategies: 1. A domain engineering targeting the binding pocket 2. (Multiple) Domain substitutions 3. Exchange unit strategies. Enzymatic domains are color coded according to the AA incorporated by the respective module. Red coloring is used to highlight engineered parts. A: Adenylation domain; P: Peptidyl carrier protein; C: Condensation domain; T: Thioesterase domain. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
NRPS module definitions and engineering strategies. (A) Classical and XU/XUC-based definition of module boundaries. (B) NRPS engineering strategies: 1. A domain engineering targeting the binding pocket 2. (Multiple) Domain substitutions 3. Exchange unit strategies. Enzymatic domains are color coded according to the AA incorporated by the respective module. Red coloring is used to highlight engineered parts. A: Adenylation domain; P: Peptidyl carrier protein; C: Condensation domain; T: Thioesterase domain. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)Cloning strategies for the engineering of NRPSs are complicated which can be attributed to the size of the corresponding genes and the homology of domains in every module. With the definition of engineering concepts, the investigation of high-throughput cloning and recombination strategies was the logical next step to increase NRP chemical space. Naturally, the interaction between multiple NRPSs is mediated via N- and C-terminal docking or communication-mediating domains [203]. These domains can also be introduced into artificially split NRPSs without losing catalytic activity [203,204]. Recently, a study combined the XU concept to artificially split native single protein NRPSs with the introduction of synthetic zippers instead of docking domains to simplify the NRPS engineering workflow [205]. The cloning of the artificially split NRPS subunits on different plasmids allows the efficient recombination of different subunits to generate large peptide libraries in vivo [205,206]. In a comparable approach, zinc finger proteins, which bind specifically to 9 base pair DNA motifs, were used to modify the gramicidin S NRPS. The NRPS was split into stand-alone modules and different zinc fingers were added to each module. In combination with a DNA scaffold, containing the binding sites of the zinc fingers, a functional DNA-templated assembly line was obtained [207].Insights from these combined efforts have increased our understanding of how NRPSs can be engineered. Despite these advancements, truly universal recombination concepts have yet to be established that allow the recombination of any conceivable module architecture beyond phylum boundaries. However, the combination of structural insights, especially into domain interfaces, evolutionary analysis and time-saving high-throughput cloning methods could pave the way towards more efficient and universal engineering strategies.Leader peptide, core peptide, PTM enzymes and proteases are the four components of RiPP biosynthesis. The focus of RiPP engineering efforts lies on the modification of the core peptide sequence and the recombination of PTM enzymes through leader peptide engineering.Instead of exchanging modules or domains as it is the case for NRPSs, the primary sequence of RiPPs can be altered by changing the gene sequence associated with the core peptide [8,208]. The challenge here is that the modifying enzymes must still be able to modify the engineered core peptide. Many studies targeted the core peptide of various RiPP families to generate artificial libraries using site-directed and random mutagenesis (Fig. 6A) [[209], [210], [211], [212]]. ProcM, a promiscuous lanthipeptide synthetase, has been used for the in vivo production of a precursor gene-encoded library of 106 lanthipeptides. Screening the obtained library resulted in the identification of a HIV p6 protein-human TSG101 protein interaction inhibitor [213]. This approach is, however, limited as not all biosynthetic enzymes exhibit natural substrate promiscuity. Therefore, studies focused on the engineering of biosynthetic enzymes to expand their substrate promiscuity. Enzyme libraries were generated using random mutagenesis and screened for tailoring enzymes with high substrate tolerance. For example, a dehydratase mutant library of NisB with 105 variants, was generated via error-prone PCR. Subsequent high-throughput screening based on cell surface display of the peptide products, revealed a NisB variant that showed substrate flexibility against non-natural substrates [214]. Moreover, covalent fusion of PTM enzymes with their cognate leader peptides has been shown to achieve substrate promiscuity and tolerance towards non-natural peptides. Microviridin variants were successfully generated using leader peptides linked with the two ATP-grasp ligases MvdD and MvdC that were subsequently used to modify core peptide libraries [215]. Although engineering of the core peptides as well as modifying enzymes has been shown to increase structural diversity and improve bioactivity of RiPPs, the structural variety of the produced peptides is still limited.
Fig. 6
Overview of RiPP engineering strategies. (A) Core peptide engineering by site-directed/random mutagenesis to create core peptide libraries. (B) Combinatorial biosynthesis to create new-to-nature RiPP products. (1) In vitro production of hybrid RiPPs using enzymes from different RiPP pathways. (2) Leader peptide engineering. The combination of two recognition sequences (RS) of PTM enzymes from different RiPP families allows in vivo production of hybrid RiPP products. (3) Sortase A (StrA)-mediated leader peptide exchange (LPX) strategy. In vitro production of RiPP hybrids by swapping leader peptides to allow core peptide modifications by different tailoring enzymes from different RiPP families.
Overview of RiPP engineering strategies. (A) Core peptide engineering by site-directed/random mutagenesis to create core peptide libraries. (B) Combinatorial biosynthesis to create new-to-nature RiPP products. (1) In vitro production of hybrid RiPPs using enzymes from different RiPP pathways. (2) Leader peptide engineering. The combination of two recognition sequences (RS) of PTM enzymes from different RiPP families allows in vivo production of hybrid RiPP products. (3) Sortase A (StrA)-mediated leader peptide exchange (LPX) strategy. In vitro production of RiPP hybrids by swapping leader peptides to allow core peptide modifications by different tailoring enzymes from different RiPP families.As an alternative to core peptide and PTM enzyme engineering, combinatorial approaches for RiPP biosynthesis have been developed. Here, PTM enzymes from different pathways are used to modify a single core peptide (Fig. 6B). Using this approach, new-to-nature RiPPs were generated, combining characteristic features of two or more RiPP pathways in one product. Examples include thiazoline-containing, prenylated and macrocyclized derivatives of cyanobactins that have been produced in vitro using a set of PTM enzymes from different cyanobactin pathways [216]. To further expand the toolset of modifying enzymes that can modify a core peptide beyond RiPP family boundaries, leader peptide engineering moved into the focus of subsequent studies [1,8]. As RiPP systems follow a leader peptide-guided biosynthetic logic, the exchange of leader peptides enables the modification of the core peptide via PTM enzymes from different pathways within the same RiPP family or beyond family boundaries. The fusion of a class I lantibiotic leader peptide with a class II core peptide resulted in the production of a chimeric lanthipeptide. The resulting RiPP harbors multiple dehydrated serine and/or threonine residues as well as (methyl)lanthionine bridges [217]. Based on the observation that many PTM enzymes recognize their substrate through specific RS located in the leader/follower peptide [218], Mitchell and co-workers further developed the chimeric leader peptide strategy. Chimeric leader peptides were generated via fusion of RSs from different leader peptides within the same precursor peptide. As a consequence, PTM enzymes from different pathways could be used to modified the same core peptide (Fig. 6). Using this strategy, thiazoline−sactipeptide and thiazoline−lanthipeptide hybrid RiPPs were successfully produced [219]. However, this approach requires detailed knowledge of the specific RSs. As an alternative to the generation of chimeric leader peptides, a leader peptide exchange (LPX) strategy was developed. In this approach, the leader peptide is enzymatically swapped after each round of modification using a sortase (StrA) [220]. StrA is a transpeptidase that catalyzes the cleavage of the peptide bond between threonine and glycine in the LPXTG motif. Subsequently, the C-terminal threonine residue is linked to the amino group of glycine of the new leader peptide (Fig. 6) [221]. The so-called StrA-based LPX technique has been used to generate novel RiPP products using PTM enzymes from cyanobactin and microviridin pathways. To do so, the leader peptide that is recognized by the cyanobactin heterocyclase (LynD) was fused with the StrA recognition motif and the microviridin J core peptide (MdnA). Following heterocyclization by LynD, the leader peptide was exchanged with the leader peptide that is recognized by the ATP-grasp ligase MdnC. Modifications by both enzymes resulted in a RiPP that harbored two thiazoline and two ω–ester cross-links [220].The realization that both NRPS and RiPP pathways produce complex peptide NPs that can harbor largely overlapping AA modifications resulted in the hypothesis that NRPS-derived peptides can be biosynthesized using synthetic RiPP pathways and vice versa. Particularly, the production of NRPs using RiPP pathways has attracted a lot of interest from the scientific community as RiPP primary sequences can be altered easily and hence large libraries of structural analogs generated for subsequent structure-activity relationship studies or to improve physico-chemical properties of peptides. The first steps towards this goal were recently taken when a lanthipeptide BGC was designed to produce RiPPs that mimic the structure of the NRPS-derived antimicrobial peptide brevicidine. Brevicidine is a cyclic depsipeptide containing N-acylated, positively charged non-canonical AA residues in the peptide backbone and a lactone ring at the C-terminus. To mimic brevicidine using the tailor-made RiPP pathway, ᴅ-AAs were replaced with ʟ-AAs, positively charged ornithins replaced with lysine and a methyllanthionine ring was installed to mimic the lactone ring of brevicidine. For the formation of the methyllanthionine bridge, the C-terminal AA was replaced with cysteine to facilitate ring formation. The designed core peptides were fused with nisin leader peptides and co-expressed with the genes involved in peptide modification and cleavage. Even though the exact brevicidine was not produced using this strategy, the structurally similar lanthipeptides showed bioactivity against Gram-negative bacteria [222]. This study can serve as a proof-of-principle that NRP-like peptides can be obtained from tailor made RiPP BGCs. These results open up an entirely new avenue for peptide NPs engineering suggesting that complex peptide NPs can be biosynthesized employing the respective bioorthogonal biosynthetic route (Fig. 7).
Fig. 7
Production of complex peptide NPs using tailor made NRPS or RiPP biosynthetic pathways. NRPS and RiPP BGCs encode biosynthetic enzymes that utilize different strategies for the production of peptides with a complex and overlapping spectrum of peptide modifications. This overlapping arsenal of peptide modifications opens up an entirely new avenue: the production of NRPS-derived peptides using the ribosomal route and vice versa.
Production of complex peptide NPs using tailor made NRPS or RiPP biosynthetic pathways. NRPS and RiPP BGCs encode biosynthetic enzymes that utilize different strategies for the production of peptides with a complex and overlapping spectrum of peptide modifications. This overlapping arsenal of peptide modifications opens up an entirely new avenue: the production of NRPS-derived peptides using the ribosomal route and vice versa.
Conclusion
NRPSs are large multi-enzyme complexes that biosynthesize NRPs in a modular assembly line-like fashion. The chemical diversity encountered in NRPs is based on the incorporation of a large variety of (unusual) amino and carboxylic acids, the presence of module-encoded facultative enzymatic domains, on and off-line tailoring reactions catalyzed by trans-acting enzymes and the fusion with other biosynthetic pathways. Moreover, novel scaffolds have evolved through recombination of serial module arrangements from different biosynthetic pathways. RiPP BGCs, on the other hand, are small and encode monofunctional enzymes. Since RiPP precursor peptides are restricted to the set of 20 proteinogenic AAs, the conversion into heavily modified non-proteinogenic AAs is achieved through an ever-increasing toolbox of RiPP modifying enzymes. Despite these fundamentally different biosynthetic strategies the arsenal of AA modifications between both biosynthetic routes is overlapping. The biochemical strategies that result in the same set of modifications, however, can vary extensively. As a result of the different biosynthetic strategies, genome-based identification and engineering strategies vary.Deciphering of universal NRPS biosynthetic principles resulted in the development of several generations of highly sophisticated genome mining platforms that can be applied to all families of linear, non-iterative NRPS systems. State-of-the-art NRPS genome mining tools predominantly rely on hard-coded biosynthetic rules. The universal biosynthetic principles are applied to annotate the canonical set of module-encoded domains. Structures of the associated peptides can be predicted based on the A domain's substrate specificity, and the colinearity principle. Since some A domains are promiscuous and can incorporate multiple AAs, the accurate peptide sequence can, in many cases, not be predicted with high confidence. As a consequence, NRPSs usually produce small libraries of related peptides rather than a single NRP. The presence of module-encoded facultative enzymatic domains, on the other hand, allows the regiospecific prediction of modification reactions. Even though universal biosynthetic principles have also been established for RiPP biosynthesis, these universal biosynthetic rules cannot be used to chart the full RiPP biosynthetic space. RiPP precursor genes and genes encoding characteristic modifying enzymes for each RiPP family are conserved within a RiPP family. Since this conservation is usually restricted to a single family, tools have been developed for the annotation of individual RiPP families. The majority of these tools relies on the hard-coded, homology-based detection BGCs of one RiPP family. Machine learning-based approaches, on the other side, have shown a lot of promise for the identification of putative RiPP BGCs beyond family boundaries. In the case of characterized RiPP families, the core peptide sequence can be predicted with high confidence, the regioselectivity of the modifying enzymes and the number of modifications introduced by a modifying enzyme, however, is currently not predictable.The fundamentally different biosynthetic principles between both biorthogonal routes are also reflected in their respective engineering strategies. NRPS engineering focuses on the manipulation of specificity-conferring components of the assembly line to alter the primary sequence of the NRP. The variability of these concepts ranges from minimal invasive methods to change the substrate specificity of single A domains to the exchange and recombination of module series. Especially the latter approach can be transformed into the high-throughput engineering of large numbers of NRPS BGCs. Changing the core peptide sequence in RiPPs can be achieved by simple alterations of three bases in the precursor gene sequence. While these simple changes can be used to generate large peptide libraries with varying core peptide sequences, modifying enzymes might no longer be able to cope with the changed peptide sequence. As a result, the leader peptide has been the focus of many engineering approaches. Leader peptide swapping, the generation of chimeric leader peptides or the fusion of leader peptides are just a few examples of how modifying enzymes from other RiPP families were successfully recruited to modify non-natural core peptide sequences. The ever-increasing number of characterized RiPP families that coincides with the constant expansion of the toolset of RiPP modifying enzymes, is paving the way towards the design of chimeric RiPP BGCs for the production of tailor-made peptides, including those that were initially reported to be of NRPS origin.
CRediT authorship contribution statement
Sebastian L. Wenski: Conceptualization, Visualization, Writing – original draft, Writing – review & editing. Sirinthra Thiengmag: Conceptualization, Visualization, Writing – original draft, Writing – review & editing. Eric J.N. Helfrich: Conceptualization, Writing – original draft, Writing – review & editing, Supervision, Funding acquisition, Project administration.
Authors: Bo Li; Daniel Sher; Libusha Kelly; Yanxiang Shi; Katherine Huang; Patrick J Knerr; Ike Joewono; Doug Rusch; Sallie W Chisholm; Wilfred A van der Donk Journal: Proc Natl Acad Sci U S A Date: 2010-05-17 Impact factor: 11.205
Authors: Manuel A Ortega; Dillon P Cogan; Subha Mukherjee; Neha Garg; Bo Li; Gabrielle N Thibodeaux; Sonia I Maffioli; Stefano Donadio; Margherita Sosio; Jerome Escano; Leif Smith; Satish K Nair; Wilfred A van der Donk Journal: ACS Chem Biol Date: 2017-01-13 Impact factor: 5.100
Authors: Emily Mevers; Josep Saurí; Eric J N Helfrich; Matthew Henke; Kenneth J Barns; Tim S Bugni; David Andes; Cameron R Currie; Jon Clardy Journal: J Am Chem Soc Date: 2019-10-16 Impact factor: 15.419