Literature DB >> 26456470

Linking Biosynthetic Gene Clusters to their Metabolites via Pathway- Targeted Molecular Networking.

Abstract

The connection of microbial biosynthetic gene clusters to the small molecule metabolites they encode is central to the discovery and characterization of new metabolic pathways with ecological and pharmacological potential. With increasing microbial genome sequence information being deposited into publicly available databases, it is clear that microbes have the coding capacity for many more biologically active small molecules than previously realized. Of increasing interest are the small molecules encoded by the human microbiome, as these metabolites likely mediate a variety of currently uncharacterized human-microbe interactions that influence health and disease. In this mini-review, we describe the ongoing biosynthetic, structural, and functional characterizations of the genotoxic colibactin pathway in gut bacteria as a thematic example of linking biosynthetic gene clusters to their metabolites. We also highlight other natural products that are produced through analogous biosynthetic logic and comment on some current disconnects between bioinformatics predictions and experimental structural characterizations. Lastly, we describe the use of pathway-targeted molecular networking as a tool to characterize secondary metabolic pathways within complex metabolomes and to aid in downstream metabolite structural elucidation efforts.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Biological Products

Year: 2016 PMID： 26456470 PMCID： PMC5055756 DOI： 10.2174/1568026616666151012111046

Source DB: PubMed Journal: Curr Top Med Chem ISSN： 1568-0266 Impact factor: 3.295

INTRODUCTION

Secondary metabolites have long served as inspirational structural and functional scaffolds for the development of new-in-class pharmaceuticals [1-2]. A longstanding era of secondary metabolite discovery followed the discovery of penicillin in the 1920s, and by the 1990s, approximately 80% of commercial drugs were natural products or natural product derivatives [3-4]. This percentage has decreased over the last few decades due to the expansion of combinatorial synthetic methods and an increase in the rediscovery rates of natural products through traditional discovery campaigns. However, with the continued expansion of microbial genome and metagenome sequence information, a resurgence in interdisciplinary academic and industrial natural product discovery campaigns is well underway [5-11]. Several major challenges exist for the discovery of new microbial natural product-derived drug leads, such as: 1) our inability to culture the majority of microbes from environmental samples (e.g., “the great plate count anomaly”) [12-17]; 2) our general lack of robust tools to broadly activate bioactive small molecule production from diverse “silent” pathways in the microbes (or heterologous expression hosts) that we can readily cultivate in the lab [18-21]; and 3) our inefficiencies in quickly identifying and dereplicating unknown metabolites from expressed pathways with often unpredictable structural and functional properties [22]. As a result, there is a continued general need for the development of interdisciplinary approaches to link “orphan” biosynthetic gene clusters to the bioactive small molecules they produce. Multipartite animal-microbe symbioses have provided rich sources of novel bacterial small molecules that naturally function in animal environments, enhancing their pharmacological potential [23-28]. By understanding the ecological niche, new small molecules can be stimulated, discovered, and investigated with overarching ecological and functional contexts. Indeed, since the turn of the century, we have come to appreciate humans and all other animals as being “superorganisms” [29]. Our resident microbes, the human microbiota, is one such source that has emerged as a prominent player in regulating both human health and disease, and thus its metabolite coding capacity, the human microbiome, is a rich reservoir of potential clinically-relevant small molecules [30-32]. The microbiota affects the host through various mechanisms including the exchange of nutrients, regulation of the immune system, protection from pathogens, and metabolism of indigestible compounds [26, 33-34]. Because of the importance of this symbiotic relationship, dysregulation of the microbiota communities (dysbiosis) has been correlated with the onset of serious health issues, including obesity, diabetes, inflammatory bowel diseases, and cancers [35-40]. Small molecule metabolites regularly mediate host-microbe interactions. And despite the ecological importance of microbial natural products, we know very little about the structures and functional roles of these compounds and how they affect human health. (Meta)genomics-guided approaches have started to shed light on the extent of this question. Sequencing of the human microbiota [41-42] revealed that human-associated bacteria encode for a wide diversity of biosynthetic gene clusters, with over 3,000 biosynthetic gene clusters being widely distributed among the sequenced microbiota of healthy individuals [30]. Much of the chemical diversity encompassed by the small molecule products of these gene clusters is found in bacteria that are associated with the oral and gut cavities. The majority of these compounds have not yet been characterized. One of the more heavily studied biosynthetic gene clusters from the microbiome is the colibactin pathway [43]. The colibactin gene cluster is a ~55 kb biosynthetic gene cluster that produces a family of polyketide-nonribosomal peptide hybrid molecules. This gene cluster is found among the Enterobacteriaceae, including Escherichia coli, Citrobacter koseri, Klebsiella pneumoniae, and Enterobacter aerogenes [44]. Additionally, the gene cluster has been discovered in the microbiota of infected coral [45] and of honeybees exhibiting an intestinal scab phenotype [46-47]. Bacteria expressing the pathway induce DNA double strand breaks and cause genomic instability of mammalian cells [48-49]. The presence of this gene cluster is epidemiologically associated with long-term persistence in the host [50]. Under inflammatory conditions, such as in inflammatory bowel disease (IBD), Enterobacteriaceae members containing this gene cluster proliferate [51]. As a result of the cytotoxicity exhibited by the small molecules from this pathway, the colibactin pathway has been directly linked to colorectal tumorogenesis in colitis mouse models [38-39, 52]. However, other strains containing the colibactin cluster, such as E. coli Nissle 1917, paradoxically have also been demonstrated to exhibit probiotic effects for patients with ulcerative colitis [53]. Gaining mechanistic insights for these functional disconnects remain the subjects of ongoing investigations. Mechanistic understanding of the phenotypes exhibited by this pathway have been hindered by the lack of colibactin structural information. Fortunately, structural and small molecule functional data are starting to emerge, providing new vantage points to experimentally elucidate the mechanistic underpinnings for the various colibactin pathway functions [54-61]. In this mini-review, we focus on the colibactin pathway as a central thematic example of linking biosynthetic gene clusters to the small molecules they produce and draw connections to other pathways invoking related biosynthetic logic. We highlight “pathway-targeted” molecular networking as one approach to more finely map expressed secondary metabolic pathways within complex metabolomes to aid in secondary metabolite identification and characterization [62]. Lastly, we discuss a few of the disconnects between secondary metabolite structure and biosynthetic predictions as illustrative examples for the continued need of enzymological characterizations of orphan biosynthetic gene clusters [63].

Genomics-guided secondary metabolite discovery

The “structure first,” then hunt for its responsible gene cluster paradigm, is transitioning to “sequence first,” then hunt for the many possible products encoded in the (meta)genomic information. Genes-to-molecules discovery approaches inherently reduce rediscovery rates of known metabolites, as novel gene clusters, i.e., novel “biosynthetic codes,” are selected as the genetic source materials. Briefly, natural products of the polyketide and the nonribosomal peptide families, for example, are often biosynthesized according to a biosynthetic code [64-66]. Many of these biosynthetic systems follow a “co-linearity” rule, in which the organization of domains dictates the order of biosynthetic operations in the pathway. Because of this logic and the wealth of mechanistic enzymology knowledge in this area, it is possible to predict with some accuracy the possible core structure(s) of assembly line-derived polyketides and nonribosomal peptides [64-66]. New bioinformatic programs such as antiSMASH [67-70] and ClusterFinder [71] have integrated a variety of existing bioinformatic algorithms and have largely automated the process of finding novel secondary metabolite gene clusters and predicting possible core structures. However, many nonlinear and iterative pathways among other confounding factors, such as hypothetical proteins serving as novel biocatalysts, necessitate a continued need for “biosynthetic code breaking.” Indeed, about half of the genes in the human microbiome are listed as hypotheticals and are completely unknown [72]. A portion of these genes will undoubtedly contribute to novel small molecule metabolism. Additionally, many putative secondary metabolite pathways are emerging in genome databases that contain proteins with no significant similarity to previously reported pathways and are not detected in algorithms that rely on currently known pathways as inputs, raising genome-guided opportunities for the discovery of new small molecule classes.

Pathway-targeted molecular networking

When comparing experimental conditions, A versus B, metabolomics groups often make functional claims based on the responses of molecules that can be mapped to established external and internal databases. Without question, many functional insights have been gained from these approaches, but what about all of the product ions that are not found in any current database? In secondary metabolite discovery operations, mapping small molecules to known databases most often fails, as novel metabolites by definition have not yet been characterized. Foundational approaches are emerging that are beginning to address this key bottleneck in microbial secondary metabolism [73]. Specifically, molecular networking techniques enhance diverse investigations of secondary metabolite discovery, regulation, and their functional roles by interconnecting structurally related molecules in silico [74]. This unbiased approach scores tandem MS (MS2) spectra based on small molecule fragmentation similarities. The molecules are then represented in a molecular network as interconnected nodes based on fragmentation relationships [74]. Using this method, an individual node, or “molecular feature” (MoF) groups with similar MoFs, forming structurally related clusters, or “molecular families.” Molecular networking has found many recent uses in investigating metabolic responses from individual microorganisms to complex cell-to-cell interactions. For example, coupling nanospray desorption electrospray ionization (nanoDESI) MS with molecular networking, the metabolic status of living microbial colonies has been characterized [74]. By growing the colony next to a competing species, the metabolic response of the microbial colony can be assessed. From the MS/MS network, small molecules that are stimulated by the challenge can be grouped into general families of metabolites. Consequently, if a novel molecule is not yet in a database but falls within a known molecular family in the network, structural information can more readily be proposed. These foundational methods accelerate natural product dereplication approaches [22], facilitate the study of intraspecies, interspecies, and even interkingdom interactions [75-76], and significantly aid in the decoding of “orphan” biosynthetic gene clusters and the “cryptic” small molecules they produce [77]. When collecting high-resolution tandem MS data in untargeted fragmentation modes, the subsequent molecular networks generated from this data are enriched in the more abundant molecules – primary metabolites, media components, and abundant secondary metabolites – and often lack less abundant molecules encoded by the pathway of interest. To address this roadblock, we conducted a modified “pathway-targeted” molecular networking strategy for the colibactin pathway that ties into the existing untargeted networking frameworks and resources [56]. In our modified strategy, we first collect high-resolution untargeted metabolomics data (MS) in isogenic wildtype and mutant strains to identify parent ions dependent on the presence of the secondary metabolic pathway, which can be cleanly deleted or inserted into a heterologous host without significant effects to cell growth. Then, we run a tandem MS experiment focusing on the fragmentation of only those unique pathway-dependent features for molecular networking. Multiple tandem MS datasets can be pooled if needed to enhance overall pathway-targeted network coverage. To illustrate the output of this approach, Fig. ( compares an untargeted and a pathway-targeted molecular network from E. coli heterologously expressing an example secondary metabolic pathway. We recently applied this approach to the bacterial colibactin pathway found in select strains of E. coli and elsewhere [56-57, 62]. The result was a focused network map of the colibactin pathway that contained critically important, lower abundance ions that were missed in untargeted fragmentation modes. Because pathway intermediates and products inherently share structural similarities, the networks from freshly prepared organic extracts greatly facilitated the structural predictions of colibactin pathway-dependent molecules in the network, some of which were unstable and decomposed during chemical processing, relative to a handful of stable pathway-dependent reference molecules extensively characterized by NMR and/or synthesis. We have found this approach to be more generally applicable in conducting detailed systems-level biosynthetic analyses of secondary metabolic pathways, such as determining genetic mutation consequences at the metabolic network level (pinpointing bottlenecks in secondary metabolism) and assessing NRPS amino acid substrate specificities at the pathway level using a combination of system-wide isotopic labeling studies and pathway-targeted molecular networking.

Fig. (1)

Pathway-targeted molecular networking of gene clusters links genes to the molecules they produce.

While pathway-targeted molecular networking has the potential to aid in natural product discovery efforts, there are a few limitations that must be considered. Pathway-targeted comparative metabolomics requires the ability to acquire spectra from both a producing and non-producing strain (comparison of a functionally expressed pathway versus a strain lacking a functional secondary metabolic pathway). Thus, a successful experiment necessitates that the pathway is not “silent” under laboratory growth conditions and that the producing host is genetically tractable, or that the pathway can be transferred and functionally expressed in a heterologous host. Despite these limitations, pathway-targeted molecular networking is a useful tool in metabolite discovery efforts and can be coupled to emerging synthetic biology techniques for pathway activation. We discuss two biosynthetic events found in the colibactin pathway – hydrolytic maturation of secondary metabolites and the incorporation of unexpected amino acids – and how pathway-targeted molecular networking has and continues to aid in the elucidation of diverse structures from this important pathway.

Hydrolytic maturation of secondary metabolites

More and more examples have emerged where sequencing of a biosynthetic gene cluster leads to a predicted biosynthetic code that does not match its expected product. One important mechanism underlying this disconnect is in the hydrolytic maturation of nonribosomal peptides and hybrids thereof, in which a larger precursor is cleaved to form smaller constituents [78]. A growing number of natural products fall into this category, including colibactin. Analogous proteolytic events occur during the maturation of many ribosomally synthesized and post-translationally modified peptides (RiPPs) from larger precursor peptides [79]. An important nonribosomal peptide example is in the biosynthesis of the prototypical monocyclic β-lactam antibiotic family, the nocardicins, which are broad spectrum antibacterials first described in the 1970s [80]. When Townsend and co-workers reported the biosynthetic gene cluster for nocardicin in 2004 through a structure-guided sequencing approach, they noted that the gene cluster encoded five modules for the predicted production of a pentapetide, but nocardicin A consisted of only a modified tripeptide sequence [81]. The group proposed that either the pathway contained inactive modules, which was the favored mechanism at the time, or engaged in proteolytic processing to explain the discrepancy between the number of modules in the biosynthetic gene cluster and the number of amino acids in the final structure of the mature antibiotic [81]. Because a candidate protease was not found in the gene cluster, experimental support for the less favored proteolytic mechanism came later through a model protease cleavage assay [82] and biochemical analysis of individual catalytic domains [83-84]. Hydrolytic processing of polyketide synthase/nonri-bosomal peptide synthetase (PKS/NRPS) hybrid products was proposed as the primary route for zwittermicin biosynthesis [85]. The structure of zwittermicin was first described in 1994 [86], but it was not until the gene cluster was identified in 2009 that it was proposed that zwittermicin is one of two major metabolites cleaved from a larger precursor molecule [85]. This precursor, prezwittermicin A, was proposed to be capped with an N-terminal N-acyl-D-asparagine. A transmembrane peptidase, ZmaM, which is encoded in the gene cluster, was then proposed to cleave the N-acyl-D-asparagine during export from the cell, releasing mature zwittermicin A. However, the first strong complementary experimental evidence for this maturation mechanism was presented in two related pathways, small molecule structures of “prexenocoumacins” from the xenocoumacin antibiotic pathway [87] and shortly thereafter an X-ray crystal structure of a peptidase from the colibactin genotoxin pathway like those found in the zwittermicin and xenocoumacin pathways [88]. From the xenocoumacin pathway, a transmembrane peptidase XcnG was required to produce the active xenocoumacins. Deletion of this peptidase led to the characterization of a family of prexenocoumacins that are capped with an N-acyl-D-asparagine. The capped prexenocoumacins exhibited no detectable bioactivity. Due to the differences in antibiotic activity between prexenocoumacins and xenocoumacins, the maturation event was described as a “pro-drug activation mechanism” [87]. The complementary X-ray structure and biochemical analysis of the related peptidase ClbP in the colibactin pathway, showed that the transmembrane peptidase was necessary for genotoxicity and was located on the inner membrane facing the periplasm, supporting the cleavage of “precolibactins” in the periplasm during export [88]. The authors noted the key similarities between the colibactin pathway and the xenocoumacin/zwittermicin pathways, but the structures of precolibactins remained unknown [89]. ClbP-dependent N-acyl-D-Asparagines, e.g. N-myristoyl-D-Asn and ClbP precursor analogs, from the colibactin pathway were identified later in accordance with the above biosynthetic logic [54-56]. The function of this N-terminal cap is likely to protect the producing strain from genotoxicity. However, the liberated N-myristoyl-D-asparagine has also been shown to have biological activities in vitro, including weak bacterial growth inhibitory effects against Gram-positive bacteria and antagonistic activities against the serotonin-7 receptor and the dopamine-5 transporter [56]. Speculatively, these activities may contribute to bacteria-bacteria and/or host-bacteria interactions in the gut [56]. An untargeted network of a single E. coli organic extract (left) contains over 1500 molecular features (MoFs). Features are clustered based on the similarity of MS/MS spectra. Endogenous metabolites, media components, as well as metabolites of interest are included in this network. A targeted analysis fragmenting gene cluster-dependent features returns approximately 50 MoFs (right) with higher pathway coverage. The production of each of these features is dependent on the presence of a gene cluster of interest. Nodes are colored according to average ionization intensity, with white nodes being present at low abundance and black nodes being present at high abundance. A variety of other NRPS/PKS molecules are enzymatically hydrolyzed during maturation. The didemnins and zeamines contain polypeptides that are removed during maturation. Didemnin B is a cyclic depsipeptide that has been investigated for use as an anticancer agent [90]. Didemnin X and Y contain the core structure of the active Didemnin B with an additional N-terminal β-hydroxyl-polyglutamine cap [90]. These derivatives were isolated with the didmenin gene cluster in hand, enabling a comparative gene cluster – small molecule structural outcome correlation. Zeamine is a PKS/NRPS hybrid molecule that was originally isolated from Dickeya zeae. Pre-zeamine was isolated from Serratia plymuthica and contains a C-terminal pentapeptide that is proposed to be cleaved by a peptidase in the gene cluster [91]. Pyoverdine is a fluorescent siderophore produced by Psuedomonas aeruginosa that is required for virulence [92-93]. Periplasmic proteins are responsible for the maturation of the chromophore as well as cleavage of a myristoyl moiety [94-95]. In vitro reconstitution of the saframycin biosynthetic pathway revealed that it is synthesized with an N-terminal long-chain acyl group [96]. Amicoumacin [97], a compound from B. subtilis that is structurally related to xenocoumacin, also employs a hydrolytic maturation strategy in which an N-terminal acyl-asparagine or acyl-glutamine is removed upon activation [98-99]. As with xenocoumacin, the acylated compound is inactive, while the mature compound is a potent antibiotic. Fig. ( summarizes currently reported NRPS and NRPS/PKS-derived structures that undergo enzymatic hydrolytic maturation. For a dedicated review of NRPS/PKS products that mature via enzy-

Fig. (2)

Known NRPS/PKS products that undergo enzymatic hydrolytic maturation.

The active cleaved metabolite is shown in black and the leader structural features, which may also have biological activity, are shown in grey. The cleaved bond is bolded. For precolibactin A, the proposed thiazoline and thiazole order and its stereochemistry were predicted by bioinformatics, tandem MS, and isotopic labeling studies, and further experimental evidence is needed. matic hydrolysis, the reader is also directed to a recent review by Bode and co-workers [78]. Hydrolytic maturation of secondary metabolites has been well established when extending beyond NRPS/PKS assembly lines. For example, cofactors used in primary metabolism can be incorporated into secondary metabolites and hydrolyzed into smaller structural units. We provide three striking examples: thienamycin, gliotoxin, and lincomycin (Fig. ). In contrast to the examples discussed above, these compounds retain part of the building blocks incorporated during biosynthesis, rather than using hydrolyzable motifs encoded in thiotemplate assembly line biosynthesis. In thienamycin biosynthesis, coenzyme A (CoA) is used as a source for cysteamine [100]. Notably, the phosphopantethiene arms between CoA and PKS/NRPS carrier proteins are shared, and polar CoA analogs produced inside of the cell typically remain sequestered in the intracellular environment. Two hydrolases associated with the thienamycin biosynthetic gene cluster are responsible for the stepwise hydrolysis of CoA to pantetheine [100]. A third hydrolase processes the carbapenem-pantetheine adduct into thienamycin, which is exported for antimicrobial defense/signaling. In other selected examples, redox-relevant cofactors such as glutathione and mycothiol are also used as sulfur donors in the biosynthesis of secondary metabolites. Gliotoxin, an epidithiodiketopiperazine produced by the fungus Aspergillus fumigatus, is a nonribosomal peptide virulence factor [101-102]. Deletion of individual genes in the gliotoxin cluster revealed that glutathione serves as an unusual sulfur donor [103]. A bisglutathione conjugate is initially formed after synthesis of the core. Subsequent hydrolyses truncate the adduct to a biscysteine conjugate. Cleavage of the C-S bond and oxidation to the disulfide result in mature gliotoxin. An analogous mechanism has been observed in the biosynthesis of the saccharide antibiotic lincomycin A produced by Gram-positive Streptomyces [104]. A mycothiol conjugate is hydrolyzed to release an N-acyl-cysteine conjugate. C-S bond cleavage and methylation follow to produce the final antibiotic. It is intriguing that the latter two examples exploit general cellular toxin detoxification strategies, the thiol nucleophiles glutathione and mycothiol, which typically neutralize electrophilic toxins, for channeling potentially toxic antibiotic intermediates. With genome sequence information now complementing structural characterization efforts, diverse maturation strategies are now emerging as more prominent routes for nonribosomal peptide processing. Moving forward, we expect more related nonribosomal peptide maturation strategies to emerge. With further mechanistic characterizations and a better understanding of the processing enzymes, the disconnection between expected structures and bioinformatics predictions will begin to close. Additionally, current bioinformatics approaches, which rely on homology to known systems or machine learning algorithms trained on known systems, still fail to identify putative “atypical” gene clusters. Because microbial biosynthetic gene clusters represent chemical traits and are well known to transfer from one organism to another via horizontal gene transfer [105], genome synteny analyses – examining the co-localization of genetic loci in phylogenetically-related organisms – continue to provide a promising route for the discovery of “atypical” natural product enzymes and pathways [28, 106]. The continued characterization of atypical secondary metabolic pathways, new biosynthetic enzymes, and new maturation strategies, and their subsequent integration into online bioinformatics programs, such as antiSMASH, is needed.

Nonribosomal peptide building blocks

NRPSs can sample from about 500 amino acid substrates, which provide a good variety of potential monomer building blocks for secondary metabolite structural diversification [66]. The continued characterization of these building blocks and their associated biosynthetic enzymes will aid in both nonribosomal, and more increasingly, in ribosomal peptide/protein engineering. While the gene cluster for colibactin was described in 2006, the recently proposed structure of precolibactin A, which accounts for a majority of the enzymatic domains in the biosynthetic pathway, contains an unexpected cyclopropane moiety [57]. This moiety was similarly found in a smaller precolibactin shunt product, where the cyclopropane was proposed to be derived from the amino acid building block 1-aminocyclopropane-1-carboxylic acid (ACC) [57-59]. In plants, ACC is an intermediate in the production of ethylene, a signaling hormone [107]. ACC is biosyntheized from S-adenosylmethionine in a PLP-dependent manner. In the proposed mechanism, nucleophilic displacement liberates methylthioadenosine and forms the three-membered ring [108]. This mechanism is paralleled in the synthesis of coronatine, albeit with a different leaving group [109]. Coronamic acid is a cyclopropyl containing intermediate that is produced through a cryptic halogenation event. A carrier protein-tethered isoleucine is chlorinated at the γ-carbon by an α-ketogluterate non-heme Fe2+-dependent oxygenase. Deprotonation of the α-carbon yields the enolate, which then displaces chloride, forming the cyclopropane moiety. Other variations of this mechanism are found in the biosynthesis of, for example, kutzneride 2 and curacin A [108]. None of the previously described mechanisms for the biosynthesis of cyclopropane substructures readily appear to be encoded in the colibactin gene cluster. The colibactin gene cluster lacks homologous PLP-dependent or Fe2+-dependent oxygenases. Isotope labeling studies indicate that the four carbons (aminobutyryl) are derived from methionine [57-58], which was also previously observed in the biosynthesis of the cytotrienin cyclopropyl moiety [110]. Since ACC is derived from methionine in bacteria, it is possible that free ACC is loaded onto the carrier protein; however, feeding studies with free deuterated ACC showed no detectable incorporation of the free amino acid in bacterial cell cultures [57]. One NRPS module with an unusual architecture is involved in the production of the spirobicyclic structural feature [57-58]. This protein, ClbH, contains an additional adenylation domain. In contrast to canonical NRPS modules that contain a condensation domain, (C), followed by an adenlyation domain, (A), and then a thiolation domain, (T), this particular protein has the architecture A-C-A-T. The second A-domain was speculatively proposed to potentially convert a carrier protein-tethered Met into SAM for cyclization [58]. Exactly how isotopically labeled Met is processed to the ACC-derived feature in colibactin remains a subject of current investigation. It is not without precedent, however, where NRPS domains directly catalyze the formation of ring-strained units, such as in the biosynthesis of the β-lactam nocardicin [84]. There, the condensation domain catalyzes the condensation with the upstream intermediate as well as β-lactam formation [84]. More recent studies on ClbH indicate that the A1 protein domain fragment activates serine in vitro as predicted by bioinformatics [111]. Free-standing carrier protein ClbE accepts this substrate in vitro, which is further oxidized to α-aminomalonate (detected as its decarboxylation product) by isolated dehydrogenases ClbD and ClbF. ClbG and ClbO, a discrete acyltransferase and PKS module, respectively, which are currently unaccounted for in colibactin biosynthesis, might be available for incorporation of this rare extender unit [111]. α-aminomalonate extender units can also be found in the zwittermicin [112], guadinomine [113], and lumiquinone antibiotics [114]. These genes are necessary for bacterial cells harboring the colibactin pathway to initiate mammalian cell DNA damage, suggesting that functional (pre)colibactin derivatives in the bacteria-host interaction may incorporate an α-aminomalonate substrate [111]. (Pre)Colibactins with this structural unit have not yet been described. Additionally, peptidase ClbL remains to be included in the colibactin biosynthetic model. However, suggestions have been made for ClbL’s potential involvement in a second cleavage event based on gene deletion analysis [58]. Many cyclopropane-containing compounds exhibit toxicity through covalent modifications of DNA. Release of the ring strain contained in cyclopropanes can contribute to their reactivity. Oftentimes, the formation of aromaticity accompanies ring opening for these irreversible reactions. Nucleophilic positions on guanine or adenine can attack the ring, particularly when the ring is positioned in conjugation with an α-β-unsaturated carbonyl, leading to DNA alkylation. For example, in duocarmycin, upon binding DNA, a conformational change positions the cyclopropane for attack by the N3 of adenine [108]. This leads to DNA alkylation and subsequent cytotoxicity. In the case of colibactin, an analogous alkylation reaction can occur (Fig. ). However, the proposed alkylated ring-opened intermediate contains a Michael acceptor (or an analogous conjugated iminium acceptor) (Fig. ). This could allow the second strand of the duplex DNA to attack, forming an interstrand crosslink between the two strands of DNA [57]. A model precolibactin shunt product, which served as a mimic of the open chain product, was shown to exhibit weak DNA interstrand crosslinking activity in in vitro assays, supporting this notion. Based on this data, we proposed several possible modes of action for colibactin toxicity (open chain and closed chain molecules), which are shown in (Fig. ) [57]. The Balskus group identified the same model compound and subjected it to a ClbP protease cleavage assay in Lysogeny Broth growth conditions (LB, 10 g/L NaCl). A mass consistent with Cl- addition to the cyclopropyl moiety of the closed-chain structure was detected. Based on this data, they similarly proposed a colibactin activity consistent with DNA alkylation (Fig. , bottom) [59]. In isotopically-labeled minimal media, we did not detect Cl- adducts, but we could see masses consistent with model colibactin cleavage products in both the open-chain (primary amine) and closed-chain (imine) forms, from freshly prepared organic extracts [57]. Alkylation and crosslinking activities can result in mutagenesis, activation of apoptotic pathways, and downstream DNA double strand breaks [115]. In considering the above possible modes of action, several structural and biosynthetic features need to be considered. As discussed earlier, during export out of the cell, precolibactins are cleaved releasing a primary amine. The amine would largely be protonated under physiological conditions, and may help colibactin bind DNA. Alternatively, the amine could undergo an intramolecular cyclization, forming a reversible cyclic imine (Fig. and , bottom). The calculated pKa for its conjugate acid iminium species is predicted to be approximately 5. With a nuclear pH of 7.2, less than one percent of the reactive iminium would be expected in the nucleus. While its neutrality may reduce inter- The colibactin warhead contains a ring-strained cyclopropane. Nucleophilic attack by DNA could lead to opening of the ring and DNA-alkylation (a). Cleavage of precolibactin liberates a primary amino group, which may be involved in DNA binding or modulating the warhead activity by forming a cyclic imine (bottom). A Michael addition (or analogous iminium addition) into the alkylated warhead could result in a DNA-interstrand crosslink (b). Formation of alternative cyclization products (pyridones) could compete with the five-membered cyclic imine cyclization route. No biological activities have yet to be reported for the stable pyridone-containing molecules (c). action with the DNA phosphate backbone, it may alternatively participate in DNA binding, and transient protonation (or DNA binding-induced protonation, pKa perturbation) of the imine may serve to activate the cyclized warhead and promote attack by DNA. Additionally, a cyclopropane-containing metabolite arising from an alternative cyclization mode featuring a pyridone scaffold was recently proposed from the colibactin pathway by MS (Fig. and Fig. ) [61]. This compound was isolated in low yields (0.1 mg/ 200 L) from a ΔclbP strain overexpressing the colibactin pathway. In a wildtype strain, protease ClbP would cleave the N-acyl-D-Asn moiety, and consequently, two competing cyclization routes can explain the structural differences (Fig. ). As no functional data was reported for the pyridone-containing colibactin metabolites, it is currently unclear if these molecules are stable shunt metabolites or advanced biosynthetic products (i.e., precolibactin B?). Precolibactin A has a predicted thiazolinyl- and thiazole containing moiety that presumably participates in DNA binding and poises the ClbP-cleaved warhead (open- or closed-chain) for electrophilic attack [57]. A bithiazole-containing structure from the colibactin pathway has also been proposed (Fig. ) [61]. Bleomycin, an antitumor compound produced by Streptomyces, contains a C-terminal bithiazole and is capped with a C-terminal cationic group, whereas phleomycin contains a thiazolinyl-thiazole moiety similarly capped with a cationic group [116]. These structural motifs are required for the action of bleomycin and phleomycin, presumably by mediating DNA binding [117]. While the diverse colibactin products proposed to date may have distinct activities, further mechanistic studies are required to assess the dominant molecular route(s) for colibactin genotoxicty in mammalian cells.

CONCLUSION

The continued expansion of microbial genome sequence information has provided enormous promise of much more to come in novel biosynthesis and secondary metabolite discovery. Based on an overabundance of historical precedence, novel structural scaffolds to come will continue to serve as leads for the development of new-in-class (ant)agonists, molecular probes, and pharmaceuticals. Discovery and characterization of novel biocatalysts will continue to expand our arsenal of biocatalytic reactions while assigning functions to the unannotated majority, hypothetical proteins. Multidisciplinary and systematic genes-to-molecules approaches have been effective at aiding in these overall efforts. In particular, molecular networking has provided a route to begin to overcome current metabolite database inadequacies, especially for groups focused on novel secondary metabolite discovery from functional or activated pathways. Pathway-targeted molecular networking, as highlighted for the genotoxic colibactin pathway, enables a detailed systems biosynthesis-level view of a targeted secondary metabolic pathway within a complex metabolome, and ties into existing molecular networking workflows. The colibactin pathway provides a nice example of the sequence first, structures to follow paradigm in the human microbiome. Deciphering the unexpected structural and functional outcomes for the colibactin pathway and determining its mode of action in vivo remain subjects of rigorous inquiry.

109 in total

Review 1. A natural prodrug activation mechanism in the biosynthesis of nonribosomal peptides.

Authors: Daniela Reimer; Helge B Bode
Journal: Nat Prod Rep Date: 2014-01-17 Impact factor: 13.423

2. MS/MS networking guided analysis of molecule and gene cluster families.

Authors: Don Duy Nguyen; Cheng-Hsuan Wu; Wilna J Moree; Anne Lamsa; Marnix H Medema; Xiling Zhao; Ronnie G Gavilan; Marystella Aparicio; Librada Atencio; Chanaye Jackson; Javier Ballesteros; Joel Sanchez; Jeramie D Watrous; Vanessa V Phelan; Corine van de Wiel; Roland D Kersten; Samina Mehnaz; René De Mot; Elizabeth A Shank; Pep Charusanti; Harish Nagarajan; Brendan M Duggan; Bradley S Moore; Nuno Bandeira; Bernhard Ø Palsson; Kit Pogliano; Marcelino Gutiérrez; Pieter C Dorrestein
Journal: Proc Natl Acad Sci U S A Date: 2013-06-24 Impact factor: 11.205

Review 3. Genomic basis for natural product biosynthetic diversity in the actinomycetes.

Authors: Markus Nett; Haruo Ikeda; Bradley S Moore
Journal: Nat Prod Rep Date: 2009-09-01 Impact factor: 13.423

4. Emerging mass spectrometry techniques for the direct analysis of microbial colonies.

Authors: Jinshu Fang; Pieter C Dorrestein
Journal: Curr Opin Microbiol Date: 2014-07-26 Impact factor: 7.934

5. Use of ichip for high-throughput in situ cultivation of "uncultivable" microbial species.

Authors: D Nichols; N Cahoon; E M Trakhtenberg; L Pham; A Mehta; A Belanger; T Kanigan; K Lewis; S S Epstein
Journal: Appl Environ Microbiol Date: 2010-02-19 Impact factor: 4.792

6. Specialized metabolites from the microbiome in health and disease.

Authors: Gil Sharon; Neha Garg; Justine Debelius; Rob Knight; Pieter C Dorrestein; Sarkis K Mazmanian
Journal: Cell Metab Date: 2014-11-04 Impact factor: 27.287

7. Lumiquinone A, an α-Aminomalonate-Derived Aminobenzoquinone from Photorhabdus luminescens.

Authors: Hyun Bong Park; Jason M Crawford
Journal: J Nat Prod Date: 2015-05-19 Impact factor: 4.050

Review 8. HUMAN MICROBIOTA. Small molecules from the human microbiota.

Authors: Mohamed S Donia; Michael A Fischbach
Journal: Science Date: 2015-07-23 Impact factor: 47.728

Review 9. Microbial genome mining for accelerated natural products discovery: is a renaissance in the making?

Authors: Brian O Bachmann; Steven G Van Lanen; Richard H Baltz
Journal: J Ind Microbiol Biotechnol Date: 2013-12-17 Impact factor: 3.346

10. antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences.

Authors: Marnix H Medema; Kai Blin; Peter Cimermancic; Victor de Jager; Piotr Zakrzewski; Michael A Fischbach; Tilmann Weber; Eriko Takano; Rainer Breitling
Journal: Nucleic Acids Res Date: 2011-06-14 Impact factor: 16.971

15 in total

1. Model Colibactins Exhibit Human Cell Genotoxicity in the Absence of Host Bacteria.

Authors: Emilee E Shine; Mengzhao Xue; Jaymin R Patel; Alan R Healy; Yulia V Surovtseva; Seth B Herzon; Jason M Crawford
Journal: ACS Chem Biol Date: 2018-11-20 Impact factor: 5.100

Review 2. Comparative mass spectrometry-based metabolomics strategies for the investigation of microbial secondary metabolites.

Authors: Brett C Covington; John A McLean; Brian O Bachmann
Journal: Nat Prod Rep Date: 2017-01-04 Impact factor: 13.423

10. Characterization of Natural Colibactin-Nucleobase Adducts by Tandem Mass Spectrometry and Isotopic Labeling. Support for DNA Alkylation by Cyclopropane Ring Opening.

Authors: Mengzhao Xue; Emilee Shine; Weiwei Wang; Jason M Crawford; Seth B Herzon
Journal: Biochemistry Date: 2018-10-31 Impact factor: 3.162